OPNsense Forum

Archive => 21.7 Legacy Series => Topic started by: ThyOnlySandman on December 20, 2021, 11:39:40 am

Title: Opnsense Vmotion hang
Post by: ThyOnlySandman on December 20, 2021, 11:39:40 am
Having Opnsense crash / hang following two vmotions.

Have Opnsense on 2 host ESXi7 cluster.  Have a switch's vlans interconnecting Internet Modem / Opnsense ESXi hosts WAN links + Opnsense ESXi hosts LAN links.

I can vmotion a single time to other host.  And all is well.  However when I vmotion back to original host after it completed Opnsense will hang.  Internet drops.  Pings latency high + drops on LAN INT.  WebGUI stops responding and cannot login via console.  Just hangs after pass.

The moment I go to reboot it via vsphere web console it responds again.  So I did bit of troubleshooting.  Believe I've got in narrowed down to IPSEC strongswan service as culprit.  If I stop service I'm able vmotion back and forth without issue.  The moment I start it I can vmotion 1 time.  But after the 2nd vmotion - hangs.

Any ideas why strongswan is causing this behavior?
Title: Re: Opnsense Vmotion hang
Post by: bartjsmit on December 20, 2021, 11:50:19 am
Is the cluster homogenous? I.e. are the hosts of identical spec?

Are you using a shared datastore and if so, which type (SAN, iSCSI, NFS)?

Do you get OPNsense console warnings about network or storage latency?

Bart...
Title: Re: Opnsense Vmotion hang
Post by: ThyOnlySandman on December 20, 2021, 12:04:47 pm
Yes identical hosts.  The Opnsense NICs are different.
Opnsense VM has VMXNET3.

VSAN 2 node.  10Gbps direct connect.  Vmotion also 10Gbps via switch trunk link.

No I don't see any logging but does hang.  So maybe it does.  I've tried to leverage console to review logs but it won't let me login.

I've just vmotioned like 10 times back and forth.  Its working perfect with IPSEC VPNs off.  So it seems to me network + vmware good to go.

Something about the arp change or ?  is upsetting strongswan / Opnsense to the point where only a reboot seems to resolve.  Thats after Host1 to host2 to host1.

Edit:  Not necessarily requiring a full reboot - that just what I've been doing to get it functional again as no console, SSH, or HTTPS.  The guest restart via vmware web console opnsense will immediately begin to respond (Although its then restarting).  That's why I decided to identify which service was the cause of the hang as it was clear it was one of the first services that  stop in the reboot sequence.

When its hung I've attempted to restart the strongswan service via webui but it will just hang and never refresh/respond.
Title: Re: Opnsense Vmotion hang
Post by: Patrick M. Hausen on December 20, 2021, 12:27:54 pm
I'd try switching to E1000 network emulation first. There is strong evidence that VMXnet3 is not the best choice for FreeBSD guests, e.g. on the TrueNAS forum.
Title: Re: Opnsense Vmotion hang
Post by: ThyOnlySandman on December 20, 2021, 12:49:33 pm
Yeah - I may give E1000 a go.  But I'd definitely clone this install for isolated test.  I've not had the greatest experience with ESXi / Opnsense NIC numbering / changing vmware adapters.  Messes up config.

My Opnsense has this "Mod" done to the VMX so that all my VMXNET3 are in proper order.  Post 2
https://forum.opnsense.org/index.php?topic=19585.msg91046#msg91046

Still given that issue appears directly related to strongswan service running I'm not certain its a driver issue.  If its not running - no hangs at all.  But maybe its combo strongswan + VMXNET3.
Title: Re: Opnsense Vmotion hang
Post by: ThyOnlySandman on December 20, 2021, 01:18:19 pm
Interesting.  After many 20+ vmotion with strongswan stopped it finally did encounter a Internet hang.  But different as opnsense was still responsive with Webui + no LAN INT latency either.

A stop of Suricata got Internet responding.  Restarted Suricata and vmotion back and forth again without issue.  So my issue maybe spanning more than just strongswan, but strongswan definitely seems apart of the hard hang.  I'm also running Sensei - native netmap driver.