OPNsense Forum

English Forums => 24.7, 24.10 Legacy Series => Topic started by: ikkeT on December 18, 2024, 08:59:09 AM

Title: OPNsense dying every few days on APU2
Post by: ikkeT on December 18, 2024, 08:59:09 AM
Hi,

I've had this problem for several months, but now getting more often. OPNsense works several days just fine, but all the sudden home traffic starts slowind down and then I can't access it any longer and network dies. I keep it up to date, it's nothing sudden, the problem has been around for several releases. Now I'm running 24.7.11.

I just had to pull the plug and reboot. I thought I look around a bit. I disabled rrd collection just to make sure it's not that. No help. I run the following services at home, not much traffic:
- HAproxy (mainly traffic to nextcloud instance
- dnsmasq for home gadgets
- kea dhcp
- captive portal for guest VLAN, hardly ever used.

I used to have IPv6 enabled, but after moving the new connection only has IPv4.

So not much running. Immediately I notice some problems:

1. Flowd is eating CPU:


76462 root          1 135    0    58M    44M CPU0     0  16:38 100.00% python3.11
# ps awfux|grep 76462
root   76462 100.0  1.1  59844 44944  -  Rs   09:23   16:57.09 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.11)



2. Config.d Errors in logs

(I have never touched unbound, it's not running)

2024-12-18T09:44:55 Error configd.py [8741e584-e8e0-47d1-940e-639b0fe9a307] Script action failed with Command '/usr/local/opnsense/scripts/unbound/wrapper.py -s ' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/unbound/wrapper.py -s ' returned non-zero exit status 1.
2024-12-18T09:30:11 Error configd.py Timeout (120) executing : system diag log '20' '0' '' 'core' 'audit' 'Emergency,Alert,Critical,Error,Warning' '1734420490.461'
2024-12-18T08:55:33 Error configd.py [eb377147-ead9-4e22-b070-4066dc2a5e25] Script action failed with Command '/usr/local/opnsense/scripts/interfaces/list_macdb.py ' died with <Signals.SIGBUS: 10>. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/interfaces/list_macdb.py ' died with <Signals.SIGBUS: 10>.
2024-12-18T08:55:33 Error configd.py [47cd8873-4e90-45dd-81a7-66fa3dfee38c] Script action failed with Command '/usr/local/sbin/pluginctl -D ''' died with <Signals.SIGBUS: 10>. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/sbin/pluginctl -D ''' died with <Signals.SIGBUS: 10>.
2024-12-18T08:53:14 Warning configd.py Stopping daemon.
2024-12-18T08:53:14 Error configd.py Configd disconnected while executing : interface list macdb
2024-12-18T08:52:52 Error configd.py Configd disconnected while executing : openvpn connections client,server
2024-12-18T08:52:52 Warning configd.py Stopping daemon.
2024-12-18T08:50:06 Error api no active session, user not found
2024-12-18T08:45:08 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:43:06 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:41:28 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:38:06 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:38:05 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:36:05 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:33:04 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:23:11 Error configd.py Timeout (120) executing : firmware remote
2024-12-18T08:20:03 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:16:03 Error configd.py Timeout (120) executing : firmware tiers
2024-12-18T08:12:01 Error configd.py Timeout (120) executing : firmware tiers


3. Disk space should be OK

root@OPNsense:~ # ls -ltrh /var/crash && df -hT
total 4
-rw-r--r--  1 root wheel    5B Dec  2 21:45 minfree
Filesystem       Type     Size    Used   Avail Capacity  Mounted on
/dev/gpt/rootfs  ufs       13G    8.1G    4.3G    65%    /
devfs            devfs    1.0K      0B    1.0K     0%    /dev
tmpfs            tmpfs    2.0G    3.5M    2.0G     0%    /tmp
devfs            devfs    1.0K      0B    1.0K     0%    /var/dhcpd/dev
devfs            devfs    1.0K      0B    1.0K     0%    /var/captiveportal/zone0/dev


So question, what the heck is this flowd doing, and how to disable it? Perhaps it's that overcooking the CPU. I found some old thread about deleting and putting interfaces back to it, I'll try. Let's see what else is there.
Title: Re: OPNsense dying every few days on APU2
Post by: ikkeT on December 18, 2024, 09:14:24 AM
I toggled the nics off and back on in netflow, and also disbabled the local service and cleared the netflow data few times. Now I got the cpu usage down at least for a while. Let's see if it stays that way now.
Title: Re: OPNsense dying every few days on APU2
Post by: bobcat321 on January 09, 2025, 09:01:33 AM
Hi there,
I've a similar issue as yours. My Opnsense router would stop working all of a sudden (Internet dies and cannot access Opnsense GUI). It's been happening more frequently now. To get back internet, I need to reboot manually.
Digging around the logs in the UI, I saw a Backend error
```
[506c11e3-fc64-4b1c-89d3-1767a6b76110] Script action failed with Command '/usr/local/opnsense/scripts/firmware/read.sh ' died with <Signals.SIGBUS: 10>. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/firmware/read.sh ' died with <Signals.SIGBUS: 10>.
```

Seems you are getting `<Signals.SIGBUS: 10>` as well which suggests maybe corrupt memory?
Following this thread.

Details:

OPNsense 24.7.11_2-amd64
FreeBSD 14.1-RELEASE-p6
OpenSSL 3.0.15

Intel(R) Core(TM) i3-N305 (8 cores, 8 threads) machine from Aliexpress

Thanks