OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: mgrue on August 18, 2020, 09:14:34 AM

Title: opnsense freezes and needs reboot
Post by: mgrue on August 18, 2020, 09:14:34 AM
I have the following setup:
- opnsense 20.1 running for months without any problem in a VMware vSphere (ESXi 6.7) VM
- Rather plain config without IDS/IPS or any special addons (Plugins os-net-snmp, os-vmware, os-dyndns)
- VM has 2 vCPUs / 1 GB RAM / 9 vNICs (VMXNET 3) / VMware Tools installed
- Average load 0.4 / Between 30-40% Memory utilisation after boot
- WAN connection is PPPoE with 175 Mb down / 40 Mb up (IPv4/IPv6)

Now I upgraded to 20.7 and subsequently to 20.7.1. The problem is that the system stops forwarding packets after 24 to 72 hours. When thise 'freeze' happens the symptoms are as following:
- No packets forwarded at all
- WebUI or SSH login not possible
- Only chance is to use the VMware console to go the command line interface
- 'Restart all services' does rarely help
- Typically a reboot helps
- in some cases the WAN connection is reporting packet loss and long round trip times after reboot,
  the only chance to heal that issue is another reboot (sometimes two times in a row)
- No log entries that would indicate a problem to me

I cannot see the root of the problems. Therefore I have no clue what I can do. Any help is highly appreciated.

P.S.: As a temporary mitigation I will setup a cron-based nightly reboot.

Thanks,
Martin
Title: Re: opnsense freezes and needs reboot
Post by: bartjsmit on August 18, 2020, 03:09:04 PM
Hi Martin,

Are any of the resources spiking in the ESXi monitoring tab leading up to the crash?

What about storage? (max IOPS/throughput)

Bart...
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on August 18, 2020, 04:02:36 PM
Quote from: bartjsmit on August 18, 2020, 03:09:04 PM
Are any of the resources spiking in the ESXi monitoring tab leading up to the crash?
What about storage? (max IOPS/throughput)

As I'm not using vCenter I don't have past metrics available and the ESXi Webinterface has only data from the last hour. But I am monitoring overall CPU utilisation of the ESXi host through SNMP and I can say that there is nothing obvious to see there for the last days. I don't monitor any further metrics yet. The ESXi datastore is on a local SSD inside the host and should be capable enough. There is a second VM on the host which experiences no problems at all.
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on August 20, 2020, 08:12:45 AM
Update:
a daily reboot at 5 AM mitigates the problem, the system doesn't freeze anymore (i.e. is routing packets between different networks/interfaces). But when rebooting the WAN latency occassionally goes up directly after the reboot (RTT > 800ms with high packet loss).

Rebooting again one or two times fixes the problem and everything is back to normal 7 to 8ms RTT. Very strange.

Title: Re: opnsense freezes and needs reboot
Post by: mgrue on August 23, 2020, 07:50:49 PM
I tried a fresh install of 20.7 which worked, but then freezed immediately 2 minutes after booting. I have reverted now to 20.1.9 - which works as expected. I will try to upgrade to 20.7 some minor releases in the future.
Title: Re: opnsense freezes and needs reboot
Post by: loganx1121 on August 24, 2020, 06:39:28 PM
I'm having the same problem on some QOTOM hardware.  2-3 days after a reboot the whole thing locks up and stops passing traffic.  Guess it must be an issue with the new version.  Does anyone know how/where I can snag an older firmware version?
Title: Re: opnsense freezes and needs reboot
Post by: marjohn56 on August 25, 2020, 08:51:45 AM
Quote from: loganx1121 on August 24, 2020, 06:39:28 PM
I'm having the same problem on some QOTOM hardware.  2-3 days after a reboot the whole thing locks up and stops passing traffic.  Guess it must be an issue with the new version.  Does anyone know how/where I can snag an older firmware version?


Running two Qotom's here with zero issues, pretty basic systems with no Intrusion detection but one does run ntopng. If you've not tried it you might want to switch out the SSD, if that has a problem  it can cause the system to freeze.
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on August 27, 2020, 04:13:43 PM
When I downgraded from 20.7.1 to 20.1.9_1 my system locked up after 24 hours or so. That was strange because 20.1 was stable and had months of uptime before. I tried to investigate further and found the setting 'VLAN Hardware Filtering' which was turned on by default starting with 20.7 (according to docs). When I took my latest config back from 20.7 to 20.1 I kept it turned on - and the system freezed.

I switched this setting to disabled and my 20.1 instance is running happily again for about 3 days. I will monitor uptime and if it stays stable I will again upgrade to 20.7 and disable VLAN Hardware filtering which seems to be a bad idea in conjunction with VMXNET3 network interfaces on VMware ESXi.

EDIT: After 7 days of uptime everything is still working smooth. Will re-upgrade to 20.7 soon.
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on September 13, 2020, 03:19:26 PM
Update: with 20.7.2 I retried the version - now with 'VLAN hardware filtering' turned off. Unfortunately the system freezes again within 48h of uptime. I'm back on 20.1 again which is stable on my vSphere host.
Title: Re: opnsense freezes and needs reboot
Post by: Fright on September 14, 2020, 10:57:22 AM
have you tried turning all offloads off?
-rxcsum -txcsum -tso4 -tso6 -lro -rxcsum6 -txcsum6 -vlanhwcsum -vlanhwtso
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on September 14, 2020, 12:35:02 PM
Yes, I have disabled all Hardware Offloading and VLAN Hardware filtering options in Interfaces -> Settings.
Title: Re: opnsense freezes and needs reboot
Post by: Fright on September 14, 2020, 04:41:04 PM
Based on the latest posts and FreeBSD Bugzilla, it seems that FreeBSD12 has some issues with the vmx driver.
could you share ifconfig output on one of vmx interfaces?
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on September 15, 2020, 07:38:17 AM
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:0c:29:2c:ec:cd
        hwaddr 00:0c:29:2c:ec:cd
        inet 192.168.179.1 netmask 0xffffff00 broadcast 192.168.179.255
        inet6 fe80::20c:29ff:fe2c:eccd%vmx0 prefixlen 64 scopeid 0x1
        inet6 2003:dd:2f26:6004:20c:29ff:fe2c:eccd prefixlen 64
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active

Edit: This is the ifconfig output from opnsense 2.1.9_1
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on September 15, 2020, 07:59:03 AM
I have found these links based on your comment:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236999 (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236999)
https://www.freebsd.org/security/advisories/FreeBSD-EN-20:16.vmx.asc (https://www.freebsd.org/security/advisories/FreeBSD-EN-20:16.vmx.asc)

Fixed in 12.1-RELEASE-p8. But I'm not sure if this really addresses my problem because it happens only when TSO is enabled (which is disabled in the opnsense GUI). Is this what you meant?
Title: Re: opnsense freezes and needs reboot
Post by: Fright on September 15, 2020, 03:27:57 PM
QuoteThis is the ifconfig output from opnsense 2.1.9_1
I'm sure everything is fine in 20.1)
but what in 20.7?
QuoteBut I'm not sure if this really addresses my problem
I'm not sure either. just trying to guess ..
but "disabled in the opnsense GUI" not the same as actually disabled
various drivers may not allow features to be disabled
eg your ifconfig on 20.1 shows that vlanhwtag enabled although the interfaces.lib.inc-script tries to disable it if disablevlanhwfilter is set
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on September 15, 2020, 03:52:10 PM
I never went to the commandline with 20.7 to check the ifconfig output. I will wait until opnsense 20.7 is based at least on FreeBSD 12.1-RELEASE-p8 and then re-try (and then also check / play with ifconfig). Thanks for the hints regarding the vmxnet driver.
Title: Re: opnsense freezes and needs reboot
Post by: rogge+opnsense on September 16, 2020, 08:35:00 PM
I am also having same issue and very similar configuration (ESXi & VM)...

What about a Periodic interface reset - rather than a system reboot?

a periodic interface reset is scheduled daily on my setup; i'll post results in a week so.

Note: i am using e1000 network 'cards'
Title: Re: opnsense freezes and needs reboot
Post by: Fright on September 16, 2020, 10:02:37 PM
thanks! intresting..thought that the transition to e1000 should help
can you please share more info about config: IPS? offloads? plugins?
Title: Re: opnsense freezes and needs reboot
Post by: rogge+opnsense on September 21, 2020, 11:16:04 PM
so periodic interface reset did not work. - I am also now cronning a reboot.

DHCP4 -V6 is disabled - all 'static' IP are ARPed
OpenDNS is enabled
UnboundDNS w/ blocklists.
IPS is on using Hyperscan.
All offloads are offloading
only using VMware plugin
Title: Re: opnsense freezes and needs reboot
Post by: Fright on September 22, 2020, 09:05:01 AM
since you use IPS have you tried ?
https://forum.opnsense.org/index.php?topic=19175.0
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on September 29, 2020, 10:14:31 AM
I have now re-upgraded to 20.7.3 and giving it a try with vSphere/vmxnet drivers.

This is the ifconfig output:
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=800028<VLAN_MTU,JUMBO_MTU>
        ether 00:0c:29:2d:79:14
        inet 192.168.179.1 netmask 0xffffff00 broadcast 192.168.179.255
        inet6 fe80::20c:29ff:fe2d:7914%vmx0 prefixlen 64 scopeid 0x1
        inet6 2003:dd:2f1b:f804:20c:29ff:fe2d:7914 prefixlen 64
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on October 02, 2020, 08:43:40 PM
After 3 days of uptime the system again stopped forwarding packets.

Obviously it ran out of memory (see screenshot). I restarted all services through SSH, didn't not help.
Rebooted and latencies on the WAN were super high again and the system was extremely sluggish.

I downgraded to 2.1.9 again and all is fine again. There seems to be a problem with 20.7 in vSphere VM and it does not seem related to the vmxnet driver. Any further ideas?

Title: Re: opnsense freezes and needs reboot
Post by: Supermule on October 02, 2020, 08:47:55 PM
What version of VM are you running??

Quote from: mgrue on October 02, 2020, 08:43:40 PM
After 3 days of uptime the system again stopped forwarding packets.

Obviously it ran out of memory (see screenshot). I restarted all services through SSH, didn't not help.
Rebooted and latencies on the WAN were super high again and the system was extremely sluggish.

I downgraded to 2.1.9 again and all is fine again. There seems to be a problem with 20.7 in vSphere VM and it does not seem related to the vmxnet driver. Any further ideas?
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on October 02, 2020, 08:52:01 PM
Quote from: Supermule on October 02, 2020, 08:47:55 PM
What version of VM are you running??
VM-Version 14 on ESXi 6.7
Title: Re: opnsense freezes and needs reboot
Post by: Supermule on October 02, 2020, 09:02:10 PM
Can you downgrade to version 10??

Quote from: mgrue on October 02, 2020, 08:52:01 PM
Quote from: Supermule on October 02, 2020, 08:47:55 PM
What version of VM are you running??
VM-Version 14 on ESXi 6.7
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on October 02, 2020, 09:12:40 PM
Quote from: Supermule on October 02, 2020, 09:02:10 PM
Can you downgrade to version 10??
No, I can't. How can that help? 2.1.9 runs happily with VM version 14.
Title: Re: opnsense freezes and needs reboot
Post by: Supermule on October 02, 2020, 11:27:55 PM
Try it.

Backup config and install an OPNsense instance in VM version 10 and report back.

Quote from: mgrue on October 02, 2020, 09:12:40 PM
Quote from: Supermule on October 02, 2020, 09:02:10 PM
Can you downgrade to version 10??
No, I can't. How can that help? 2.1.9 runs happily with VM version 14.
Title: Re: opnsense freezes and needs reboot
Post by: GreenMatter on October 03, 2020, 02:32:55 AM
I won't help much, just to let you compare. I have recently upgraded to 20.7.3 and so far, so good...
Difference is that I use ESXi 7.0, all HW offloading is enabled and OPNsense is VLAN aware; vmx0 is WAN and vmx1 is VLAN parent for LAN side:

Quote
vmx1: flags=8a43<UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> metric 0 mtu 1500
   options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
   ether 00:0c:29:d4:ba:59
   inet6 fe80::20c:29ff:fed4:ba59%vmx1 prefixlen 64 scopeid 0x2
   media: Ethernet autoselect
   status: active
   nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Attached are screenshots of vm switch and port group settings...

Title: Re: opnsense freezes and needs reboot
Post by: mgrue on October 14, 2020, 11:42:14 AM
Quote from: GreenMatter on October 03, 2020, 02:32:55 AM
Difference is that I use ESXi 7.0, all HW offloading is enabled and OPNsense is VLAN aware;

This made me think. I enabled all offloading capabilities including VLAN filtering and now the system is up and running for nearly 10 days. Thanks for putting me into the right direction.

Soon I will move the opnsense VM to a new ESXi 7.0 U1 box with a more powerful CPU, a 10 Gbit NIC and more RAM. Lets keep fingers crossed that the stability will stay.
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on October 28, 2020, 02:30:04 PM
I have ported the whole thing to ESXi 7.0 U1 on a latest generation i3 Processor. The system is now up and running with 20.7.4 since it has been released. I have assigned more RAM to the VM (1.5 GB instead of 1 GB).
All offloading capabilites have enabled with a Broadcom 57810 10 Gig NIC behind the vSwitches. Runs great so far.
Title: Re: opnsense freezes and needs reboot
Post by: scream on December 18, 2020, 06:58:51 PM
Any news to this issue?
I think I'm running into a simillar case as well.

opnSense VM on ESX 7.0.1 with vmxnet3 cards.
VMWare tools are installed. All pakets are up to date.

On my case it doesn't occour daily but sometimes when I take a snapshot or when using vMotion to move the VM to another Host.

Issue shows as WAN gateway has latency over 1000ms and a lot of packet loss. Didn't found another solution except reboot until now. :-/
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on December 18, 2020, 07:19:08 PM
What fixed my problems was:
- Enabling all hardware accelerations/offloads under Interfaces / Settings
- Moving to faster hardware with 10 GbE NICs
- Updating to VMware ESXi 7.0 U1

I can't not 100% tell what really fixed the problems, but they are gone
Title: Re: opnsense freezes and needs reboot
Post by: scream on December 18, 2020, 07:43:58 PM
Tried now with enabled hardware offloading... didn't help for me.
ESX is up to date with the latest patches released a few days ago.
I already use 2*10GbE (Intel X722) as uplinks with LACP configured on dvSwitch.

Which NICs do you use? (E1000E or VMXNET3?)
Title: Re: opnsense freezes and needs reboot
Post by: mgrue on December 18, 2020, 07:52:43 PM
VMXNET3, Broadcom 57810 NICs, no LACP, Standard vSwitch
My VM has 1.5 GB of memory now. Before I had 1.0 GB and ran out of memory occasianally which also created very sluggish routing behaviour.
Title: Re: opnsense freezes and needs reboot
Post by: scream on December 18, 2020, 08:18:18 PM
Okay. May I will test without LACP first. My VM is running with 8GB RAM as I use Sensei with Elasticsearch.
But with 20.1.9 it was running without any issue for many month.
Title: Re: opnsense freezes and needs reboot
Post by: nykaer on February 02, 2021, 08:57:43 AM
Seems to be solved on my environment now.
On VMware ESXi 6.7U3.

Interfaces -> Settings: Enabled all hardware offloading + VLAN hardware filtering.

The last part seemed to do the trick, and have been running for more than 48 hours now.  Before I needed to reboot every 6-10 hours or so..


Title: Re: opnsense freezes and needs reboot
Post by: Rajstopy on February 12, 2021, 01:24:29 PM
Hi there,

Looks like I've a very similar issue here... OPNSense was running well for months but suddenly interfaces begun to be stuck. Rebooting OPNSense usually solves temporarily the problem, but if I reboot the hypervisor itself then I'm quiet for several days. This issue just makes me nuts because I've a lot of services relying on my network connection.

I suspected Wireguard, but seems to occurs even if the service if off...

An answer I received this morning told me about another VM that could cause the system NIC to freeze. I remember my problem appeared suddenly one day, without having changed anything on the system... but perhaps a new VM

Do you remember if you noticed this issue after having added a new VM?

R.
Title: Re: opnsense freezes and needs reboot
Post by: tmueko on June 15, 2021, 08:59:53 PM
I now have the problem with 21.1.6.
FreeBSD-12.x and pfSense are working fine.
I have a OpnSense-Cluster on two Dell R630 on 10GB-Links. Sometimes, both VMs freezed within one hour :-(

Tried all combinations of +/- lro,  +/- tso, +/- (rxcsum, txcsum) and vlanhwtag: Nothing worked.

Any new idea on this?