OPNsense Forum

Archive => 23.7 Legacy Series => Topic started by: ReK_ on September 17, 2023, 03:00:04 PM

Title: 23.7.4 Multiple Processes Crashing
Post by: ReK_ on September 17, 2023, 03:00:04 PM
I have a new installation of 23.7.4 running inside a VM. The host is Debian-based running KVM. All NICs are VirtIO and VLANs/bridging is handling on the host, so all OPNsense NICs are straight vtnetX with no VLANs or anything.

I see a lot of the following, from multiple processes:

<13>1 2023-09-17T12:23:18+00:00 rtr-a.REDACTED kernel - - [meta sequenceId="1"] <6>pid 85864 (php-cgi), jid 0, uid 0: exited on signal 8 (core dumped)

Here's a count of how many times it's happened when it's been booted 6.5 hours:


root@rtr-a:~ # uptime
12:42PM  up  6:33, 1 user, load averages: 0.00, 0.00, 0.00

root@rtr-a:~ # cat /var/log/dmesg.today | grep "exited on signal 8" | cut -d"(" -f2 | cut -d")" -f1 | sort | uniq -c | sort -rn
666 syslog-ng
  85 python3.9
  37 php-cgi
   8 cpustats
   3 php
   2 unbound
   1 ntpd


As far as I can tell nothing is resource starved. CPU, memory and disk all look fine. Actual traffic is passing through the firewall with no issue, though Unbound going down obviously breaks DNS. The web UI is very buggy, likely due to all the php-cgi crashes, with a mix of 500 errors, widgets with no data, and config pages with no values.

One odd thing I've noticed is that, while Unbound says it's listening on localhost, I can't resolve anything that way. It does work if I resolve via an interface address though. I've set the system to use its interface as a resolver and not insert localhost, but maybe the crashes are due to that via hardcoded resolution somewhere?


root@rtr-a:~ # sockstat -l | grep unbound | grep 127
unbound  unbound    2771  9  udp4   127.0.0.1:53          *:*
unbound  unbound    2771  10 tcp4   127.0.0.1:53          *:*
unbound  unbound    2771  19 udp4   127.0.0.1:53          *:*
unbound  unbound    2771  20 tcp4   127.0.0.1:53          *:*
unbound  unbound    2771  25 tcp4   127.0.0.1:953         *:*

root@rtr-a:~ # host google.com 127.0.0.1
;; connection timed out; no servers could be reached

root@rtr-a:~ # host google.com
google.com has address 142.251.33.78
google.com has IPv6 address 2607:f8b0:400a:806::200e
google.com mail is handled by 10 smtp.google.com.

root@rtr-a:~ # cat /etc/resolv.conf
domain REDACTED
nameserver 192.168.10.1
nameserver 149.112.121.20
nameserver 149.112.122.20
search REDACTED


How do I start tracking down the reason for these crashes? Signal 8 means something like a divide by zero error, but that's as much detail as I can find in any log. Everything else says it's crashing due to some process throwing signal 8, like the following in configd, which points to one of the python3.9 crashes:

<11>1 2023-09-17T10:01:01+00:00 rtr-a.REDACTED configd.py 42133 - [meta sequenceId="821"] [52d7820a-23c3-45e3-a162-42918768eca8] Script action failed with Command '/usr/local/sbin/pluginctl -S '' ''' died with <Signals.SIGFPE: 8>. at Traceback (most recent call last):   File "/usr/local/opnsense/service/modules/actions/script_output.py", line 44, in execute     subprocess.check_call(script_command, env=self.config_environment, shell=True,   File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call     raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/sbin/pluginctl -S '' ''' died with <Signals.SIGFPE: 8>.

More system info:

(https://i.imgur.com/osuW78F.png)
Title: Re: 23.7.4 Multiple Processes Crashing
Post by: newsense on September 17, 2023, 06:08:50 PM
Zenarmor, Suricata in blocking mode or RSS present ?
Title: Re: 23.7.4 Multiple Processes Crashing
Post by: schmuessla on September 17, 2023, 07:43:22 PM
So many crashes in different processes sounds like a hardware problem or some other inconsistencies.
My opnsense setup broke a couple of days ago with random crashes as well. My file system (UFS) was inconsistent (corrupt). Reinstalled opnsense and it worked again.
Title: Re: 23.7.4 Multiple Processes Crashing
Post by: ReK_ on September 18, 2023, 10:12:32 AM
Quote from: newsense on September 17, 2023, 06:08:50 PM
Zenarmor, Suricata in blocking mode or RSS present ?

No IDS/IPS enabled, I've only setup basic routing and ACLs, DHCP, DNS (including mdns-repeater), NTP, and Monit.

Quote from: schmuessla on September 17, 2023, 07:43:22 PM
So many crashes in different processes sounds like a hardware problem or some other inconsistencies.
My opnsense setup broke a couple of days ago with random crashes as well. My file system (UFS) was inconsistent (corrupt). Reinstalled opnsense and it worked again.

That's what I thought as well but I have other VMs running on this host with zero issue, not to mention the host itself (TrueNAS Scale). This is the only BSD-based VM, all the others are Debian-based. As this is a fresh install I have tried nuking the entire VM and rebuilding from scratch but got the same result.