OPNsense randomly freezes under hyper-v

Started by jaferrer, November 29, 2024, 02:54:37 PM

Previous topic - Next topic
Hello there!

     I've been using OPNSense for quite some time. I performed migrations in several of my customers without any problem. But now I'm facing this situation of one of hour own firewall in where I'm running out of ideas, so I'll like to know if anyone has had these experience before and if they could solved it.

The symptoms:
     Randomly ( from 2 or 3 times in a day to 1 time every 2 weeks), mainly at night, the firewall stop forwarding traffic, even you don't receive ping replies from some, not all, interfaces; including ping the interface address from the console. The only option to recover is to reboot the machine. Some times is not even possible to do it by the console, so I have to hard-reset it. After that the firewall resumes operation normally.

The Setup:
     -OPNsense 24.7.9_1-amd64 | FreeBSD 14.1-RELEASE-p6 | OpenSSL 3.0.15
     - Running as a virtual machine under Hyper-V Server 2019 | Generation 2 | Config Version 9.0
     - 16GB of fixed memory (not dynamic)
     - 6 virtual processors
     - 5 virtual interfaces assigned to two Virtual Switches in different Vlans. (I'm not using vlans inside OPNSense)
     - Each Virtual switches are made of two bonded physical links
     - Host's Physical nics : 4xBroadcom NetXtreme Gigabit Ethernet | 2x Intel Pro/1000
     - Using Geoip rules in several rules (~10 rules)
     - Physical Machine: HP Proliant DL 360 Gen9 (I know.. it's old but still running ;-) ) 
          - 1 CPU Xeon E5-2630 v3 @ 2.40 Ghz  8 core
          -  60GB RAM DDR4

What I've found so far:
     - According to zabbix monitor prior to the event, I noticed the following:
         - Sustained traffic in 2 or more interface about 200Mbps
         -  CPU usage is around 25% to 50%
         - Memory usage around 20%
        - a small increase in CPU load just before the firewall freezes.
        - a significant spike in CPU jumps or context switches, from around 10K to more than 1M in less than 1 minute.
        - after 5 to 10 minutes after this spike, the firewall freezes and zabbix agent stop collecting metrics.
        - Most of the time it freezes at night, where some backups are running, but the maximum bandwidth that they run is 300Mbps.

What I've done so far:
      - change CPU and memory assignments
      - turn off all unnecessary services in OPNSense (Like Squid)
      - At first I was passing VLANS as a trunk and using vlans inside OPNSense to create each interface associated to a specific vlan. Then change to decompose the VLans in the virtual nic, so I don't use Vlans inside the firewall.
      - disable all NICs offload inside OPNSense
      - disable all virtual Nic offload in hyper-V
      - I read about a similar situation in FreeBSD guest using Broadcom and Hyper-v VMQ (Virtual Machine Queues) So I disabled it in the virtual interfaces and also in the physical interface
       - disable and delete a pair of URL Table (IP) Aliases
       - I've configured Monit to reboot the firewall at the moment it loses connectivity to the gateway. It works half of the time though; the firewall seems to get stuck during shutdown.
       - I have a replica in another host, with different hardware characteristics (Diff CPU and RAM) and it's also freezes, even more frequently.
        - Running other 3 OPNSense firewall in the same infrastructure that doesn't suffer from the same problems

My suspicious:
      - FreeBSD bug?
      - Hyper-v Incompatibility?
      - Nics problem/driver ?
     

So, I'm sorry for the extend on the post, but I'm reaching the community if some has a hint or idea of what can I do next to solve this problem. I'll much appreciate.

Thanks!

Does anyone have a clue about what could be wrong?

I've investigated further and found out that some guys in PFSense are having the same problem (https://forum.netgate.com/topic/190927/pfsense-2-7-2-in-hyper-v-freezing-with-no-crash-report-after-reboot/12)

One thing in common is the FreeBSD Version 14.x, when I was running FreeBSD 13.2 which was in OpnSense 24.1 I didn't have this problem. Furthermore this problem start just a week or so after update to 24.7 which install FreeBSD 14.1

One of my OPNsense VM on OpenStack was freezing as well - kernel panic. Check the logs to see if it's not happening to you.
Paste output from uname -a. Reinstalling the kernel and syslog-ng helped me.

Thanks for your reply.
Based on the post from Pfsense forum, I downgraded the firewall to version 24.4.x which is running on FreeBSD 13.x

I hope that it stays stable now.

I'll keep monitoring it.

I'll post any news about it!