20.7.4 vMotion breaks network connectivity

Started by EHRETic, November 05, 2020, 09:47:06 PM

Previous topic - Next topic
Hi there,

Don't know where to start with, but since I've upgraded from 20.1 to 207, it seems I have more and more strange issues (I got some unbound issues too, I had to disable DNSSEC temporarily)

Usually, my FW is on a "should run on this host" in order that my SIEM can capture the traffic. Today, I had to maintain some ESXi host and I vMotioned the FW a few times... it crashed twice! ??? None of the other VM suffered a network connectivity.

I've to mention this NEVER happened before with previous versions, I could vMotion dozen of times.

I couldn't see some issues because I didn't had to move the VM since I've upgraded.

Where do we start to troubleshoot that? ;)

Some more info:
- vSphere 7.0U1
- VMXNET3 cards on VDS
- multiple VLANs & interfaces
- no IPS activated


Quote from: mimugmail on November 06, 2020, 05:44:47 AM
Console output during move/crash would be a good start. :)

Well that is the thing: there is no "crash", only network connectivity loss to WAN interface I presume.
When it happened, I could still access the console web page but WAN_GW was not reachable (can't recall the amount of loss, but it was high)

The VM is configured as follow :
2 NICs: one WAN, one LAN with multiple VLANs/interfaces

I'll try to reproduce the issue tonight when wife and myself are not working! ::)

Any advice for logging that properly?

I have one install of 20.7 where I deliver the opnsense on the the other end of a tunnel as an additional DNS server (both have unbound with DNSSEC and DNS-over-TLS (port 853) configured) via DHCP, as the DNS is unreliable on this box since 20.7.

I assume the ISP for this box interfers with DNSSEC/DNS-over-TLS...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: chemlud on November 06, 2020, 02:42:44 PM
I have one install of 20.7 where I deliver the opnsense on the the other end of a tunnel as an additional DNS server (both have unbound with DNSSEC and DNS-over-TLS (port 853) configured) via DHCP, as the DNS is unreliable on this box since 20.7.

I assume the ISP for this box interfers with DNSSEC/DNS-over-TLS...

We might have something here... I've 2 OPNsense with an IPSEC tunnel between them. Both has their local Internet/DNS breakout, but I have a feeling that sometimes, something is not working properly in DNS resolution.

I hate putting something else as a workaround, but this might be my excuse to try out Pi-Hole, but I'm not sure yet if it can forward DNS resolution in a secure way... ;D

March 04, 2021, 12:22:28 PM #6 Last Edit: March 04, 2021, 12:41:09 PM by maurotb
I have same problem

Version
OPNsense 21.1.2-amd64
FreeBSD 12.1-RELEASE-p13-HBSD

On vspere7

After vmotion i lost network,
to make network work i need i need

reboot opnsense or
make ifconfig vmxX down ifconfig vmxX up on all interface or
make another vmotion in original server (note,vswitch and vsphere network are ok,other vm make vmotion correctly)

I try to replace vmxnet3 with e1000e , same problem.
No log or error in console and dmesg


After spend some time, i have resolved my vmotion problem.
In my huawei L3 switch,i have to put
mac-address update arp
undo arp anti-attack entry-check enable

Hi,

On my side, everything was solved when I changed my hosts NICs to 10GB/s, it never crashed again.
From an option point of view, I didn't change a thing, but OPNsense was also updated a few times since then.

Good you could fix it!  ;)