Esxi > vSwitch > OPNsense = management network loop?

Started by GreenMatter, April 13, 2021, 01:09:34 PM

Previous topic - Next topic
Maybe it's not completely right place to ask but I've been battling this issue for some time without any success, so; I'll try here.
OPNsense runs as VM on Esxi host. One physical NIC is assigned to separate vSwitch and port group as WAN uplink, second physical NIC is assigned on other vSwitch as LAN uplink (OPNsense is in portgroup "OPN LAN", VLAN ID:4095 - tagged traffic on physical switch, OPNsense is VLAN aware).
All works well with exception of loop(?) which sometimes appears in Esxi management lan - it happened yesterday after rebooting Esxi host.
Esxi management, kernel NIC (vmk) is connected to the same LAN vSwitch with port group "MNGMT LAN", VLAN ID: 0 (native lan on physical switch). Esxi's gateway is set to be in the same management network.
Other VMs are connected to LAN vSwitch with use of port groups with relevant VLAN IDs...
Routing table:

[root@esxiss:~] esxcli network ip route ipv4 list
Network     Netmask          Gateway     Interface  Source
----------  ---------------  ----------  ---------  ------
default     0.0.0.0          172.16.0.1  vmk0       MANUAL
10.55.0.0   255.255.0.0      0.0.0.0     vmk1       MANUAL
172.16.0.0  255.255.255.0    0.0.0.0     vmk0       MANUAL
172.16.4.0  255.255.255.192  0.0.0.0     vmk2       MANUAL

vmk1 belongs to storage network in order to serve quick connection between VMs and Esxi for datastores. Network loop/storm is triggered especially when I try to access Esxi webgui via vmk0 (port group "MNGMT LAN", VLAN ID: 0)
As a temporal solution/workaround, I've created additional kernel NIC (vmk2) configured in port group VLAN ID: 14.
All that storming happens within Esxi or OPNsense as there's no sign of this traffic in physical switch stats. But if during storm, I unplug LAN vSwitch uplink - all stops immediately.
So, how to start troubleshooting properly?
OPNsense on:
Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (4 cores)
8 GB RAM
50 GB HDD
and plenty of vlans ;-)

It seems like manually adding a route (to vlan I initiate communication from) in esxi:
esxcli network ip route ipv4 add --gateway 172.16.0.1 --network 172.16.4.0/26
does the job.
Now routing table is as follows:

[root@esxiss:~] esxcli network ip route ipv4 list
Network     Netmask          Gateway     Interface  Source
----------  ---------------  ----------  ---------  ------
default     0.0.0.0          172.16.0.1  vmk0       MANUAL
10.55.0.0   255.255.0.0      0.0.0.0     vmk1       MANUAL
172.16.0.0  255.255.255.0    0.0.0.0     vmk0       MANUAL
172.16.4.0  255.255.255.192  172.16.0.1  vmk0       MANUAL


OPNsense on:
Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (4 cores)
8 GB RAM
50 GB HDD
and plenty of vlans ;-)