OPNsense VM just dying

Started by bolmsted, January 25, 2024, 05:57:35 PM

Previous topic - Next topic

Over the last couple of months my OPNsense VM is just dying or possibly crashing inside my Proxmox environment and I have to recycle the VM and we loose our internet when the VM hangs/dying/crashes happens.   

This has happened a number of times over the last couple of months just spontaneously and I've tried to patch to the latest patches for OPNsense when this has happened but still crashing sporadically and most recently today.

Any idea what is going and which logs in FreeBSD would help narrow down what is causing this?

It was working for the last 1-1.5 years without issues and suddenly OPNsense is just crashing

Did you consider doing a search first? I've just done one and the following link shows the results:

https://www.startpage.com/do/dsearch?query=proxmox+opnsense+vm+keeps+stopping&language=english&cat=web&pl=ext-ff&extVersion=1.1.7

Do any of those solve your problem?
Regards


Bill

I of course searched online but different search engine so will see if any of these solve problem.   Looks like someone suggested updating limits.conf and possibly the queues so will look into this but it may take time to realize if any fix happens as it is every few months.   Just thought this might have popped up as a known issue.

I'm still on v 23.1.X as VM on Proxmox, doesn't die/crash. What are your settings for VM ?

January 28, 2024, 02:45:57 AM #4 Last Edit: January 28, 2024, 02:56:57 AM by bolmsted
I'm on 23.7.12 running on Proxmox 8.1.4.   I adjusted the limits to 4096 open files (from default of 1024) and increased queues=8 on the network interfaces but not sure what specific settings you are referring to.

Quote
root@proxmox:/var/log# for pid in $(pidof kvm); do prlimit -p $pid | grep NOFILE; ls -1 /proc/$pid/fd/ | wc -l; done
NOFILE     max number of open files                4096    524288 files
297
NOFILE     max number of open files                4096    524288 files
46
NOFILE     max number of open files                4096    524288 files
46
NOFILE     max number of open files                4096    524288 files
46
NOFILE     max number of open files                4096    524288 files
41
root@proxmox:/var/log#

I copied /etc/security/limits.conf to /etc/security/limits.d/limits.conf and edited to add the line for NOFILE as the numbers in the for loop execution were 1024 prior to changing and rebooting proxmox
Quote
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#root            hard    core            100000
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4

root      soft    nofile      4096

# End of file
root@proxmox:/etc/security/limits.d#


Changed the vmbr# interface setting for queues=4 to queues=8 as per one of the posts and will monitor to see if the system stays stable for much longer.  I guess if I haven't come back to this post in 6 months asking for another fix I guess it is working but just tried the first suggestion from the search results.


Looks like I edited /etc/systemd/system.conf as I recall copying the system.conf to /root as I didn't want to leave a copy in the directory in case it got read by the OS
Quote
root@proxmox:/etc/security/limits.d# diff /etc/systemd/system.conf /root/system.conf.20240126
67,70d66
< #
< #added to see if prevents OPNsense VM from hanging/crashing
< DefaultLimitNOFILE=4096:524288
< #
root@proxmox:/etc/security/limits.d#

This is good for documenting what I did too if this comes back to haunt me later.


In case the search results change from 2nd poster I went to this page from search results
https://forum.proxmox.com/threads/opnsense-keeps-crashing.131601/
and then to linked page for modifying the system.conf and creating limits.conf
https://forum.proxmox.com/threads/qemu-crash-with-vzdump.131603/#post-578351

Well you must have a reason to make all these adjustments. For me, the node where OPN is hosted, is still on 7.4-17. I had a lot of trouble when I updated to 8.1.4 over Christmas on another node.
OPN has the settings like this:
Memory: 4 GB
Processors: 2 (1 sockets, 2 cores) [host]
BIOS: OVMF (UEFI)
Display: Default
Machine: q35
SCSI Controller: VirtIO SCSI Single
Hard Disk (sata1): size=32G
EFI Disk: efitype=4m,size=1M
And 3 NICs PCI Devices passed through.
Guest Agent: Enabled

I have no adjustments either on Proxmox nor OPN. Now for diagnosing why OPN crashes on you, maybe you can do the following. I did use this method on that other node where my existing VM would crash on the new Prox version, and had been stable on the previous. Admittedly easier as it is an Ubuntu one. In short, I spun up a small ubuntu server VM on another node, enabled rsyslog to receive logs. Enabled sending rsyslog from the crashing one to this one. That showed me what was triggering the VM to fail.


I can try setting syslog to one of my Ubuntu VMs sitting on the proxmox host but can't do a lot of fiddling as it is our home internet gateway and the natives get restless especially because we both WFH     The NOFILE limit makes sense so started with that from recommendations but willl see if syslog gives any clues if this happens again


Of note in the other thread

Quote
Aug 7, 2023
#32
showiproute said:
Doubling everything seems to solve the problem also.
My configuration now includes queue=8 again and the VM didn't crash.

Question is why was it running with PVE7.x while PVE8 needs some manual config changes.

Fiona (Proxmox Staff Member)
QEMU might've changed internal things and uses more file descriptors now. And the OPNSense update might've changed interaction with QEMU and the host too. Likely, you were already near the limit before, but didn't quite hit it.


I see. Good to know for when the time comes for me to upgrade this node. Thanks.