1
Hardware and Performance / Vlan's broke when tagging virtual bridge interface with vlan?
« on: October 18, 2023, 08:08:36 pm »
Greetings all,
Im a network administrator in charge of managing a cluster of Proxmox servers using OPNSense to
provide firewalling and network access to the servers and to a WiFi network which resides on the same network.
background:
These networks are seperated in vlans, with the cluster being on vlan 90 and the WiFi having 2 Vlan's residing on 2 seperate SSID's (vlan's 150/155)
the network hardware we use is a cisco3650 switch and a Dell Powerconnect 8024F switch.
The OpnSense firewall resides on a central proxmox server that is disconnected from the main cluster.
Besides the OpnSense-VM this server provides a local PXE server, monitoring and Ansible-Automation machine we use to manage the proxmox servers and this central server connects via LC fiber to the Cisco3650 Switch. Which then connects to the Dell Poweconnect that connects all servers via 10Gbe (this is where the main cluster resides)
The problem:
I got a call today that the cluster services where down and only vlan 90 appeared to be down but vlan's 150, 155 appeared to be working. I did some troubleshooting and found out that vlan90 tagged/untagged networks couldn't be used on the Dell, Cisco and OPNsense, (even opnsense itself couldn't ping the VLAN90 VIF IP via the shell)
After some time I decided to reboot the firewall where after reboot OPNSense (via the shell) was able to ping the IP residing on the vlan 90 VIF, However the cisco couldn't only after rebooting the cisco switch the VLAN
VIF's could see/ping eachother. (You could guess what we had to do to get the Dell to work again aswell)
In the end after rebooting each device vlan 90 was functional.
However, how did this happen? (this is where it gets really wierd):
After investigating it appeared one of my colleagues had set up a VM (on the central server, where the OPNSense resides on aswell) and wanted to connect this to the VLAN 90 network by tagging the VMBR1(shared by all VM's) with vlan 90 for only his specific VM, no other VM's get this tag applied other than his own. as by default all VM's get put in the native management VLAN when setup on the central server.
However this simple config resulted in the VIF interfaces on all trunked interfaces on (OpnSense, Cisco, Dell) to somehow completely break and not function this happend only to this specific VLAN, (150 and 155) where fine, (these where traversing on the same interface as vlan 90) and where correctly passing traffic.
So in a nutshell, by tagging a VM (only 1 vm) with this specific vlan where the OPNsense VM should not be affected it managed to break all vlan interfaces tagged with this vlan on 3 seperate devices.
....Is this normal??? did anyone else experience something like this? because this is a first one for me.
Love to hear your toughts on this!
Im a network administrator in charge of managing a cluster of Proxmox servers using OPNSense to
provide firewalling and network access to the servers and to a WiFi network which resides on the same network.
background:
These networks are seperated in vlans, with the cluster being on vlan 90 and the WiFi having 2 Vlan's residing on 2 seperate SSID's (vlan's 150/155)
the network hardware we use is a cisco3650 switch and a Dell Powerconnect 8024F switch.
The OpnSense firewall resides on a central proxmox server that is disconnected from the main cluster.
Besides the OpnSense-VM this server provides a local PXE server, monitoring and Ansible-Automation machine we use to manage the proxmox servers and this central server connects via LC fiber to the Cisco3650 Switch. Which then connects to the Dell Poweconnect that connects all servers via 10Gbe (this is where the main cluster resides)
The problem:
I got a call today that the cluster services where down and only vlan 90 appeared to be down but vlan's 150, 155 appeared to be working. I did some troubleshooting and found out that vlan90 tagged/untagged networks couldn't be used on the Dell, Cisco and OPNsense, (even opnsense itself couldn't ping the VLAN90 VIF IP via the shell)
After some time I decided to reboot the firewall where after reboot OPNSense (via the shell) was able to ping the IP residing on the vlan 90 VIF, However the cisco couldn't only after rebooting the cisco switch the VLAN
VIF's could see/ping eachother. (You could guess what we had to do to get the Dell to work again aswell)
In the end after rebooting each device vlan 90 was functional.
However, how did this happen? (this is where it gets really wierd):
After investigating it appeared one of my colleagues had set up a VM (on the central server, where the OPNSense resides on aswell) and wanted to connect this to the VLAN 90 network by tagging the VMBR1(shared by all VM's) with vlan 90 for only his specific VM, no other VM's get this tag applied other than his own. as by default all VM's get put in the native management VLAN when setup on the central server.
However this simple config resulted in the VIF interfaces on all trunked interfaces on (OpnSense, Cisco, Dell) to somehow completely break and not function this happend only to this specific VLAN, (150 and 155) where fine, (these where traversing on the same interface as vlan 90) and where correctly passing traffic.
So in a nutshell, by tagging a VM (only 1 vm) with this specific vlan where the OPNsense VM should not be affected it managed to break all vlan interfaces tagged with this vlan on 3 seperate devices.
....Is this normal??? did anyone else experience something like this? because this is a first one for me.
Love to hear your toughts on this!