OPNsense crashing randomly, please help me figure out why?

Started by shenaniganz, April 01, 2024, 11:50:00 PM

Previous topic - Next topic
Hi. First time posting here. I've been struggling with this issue for a while, and have tried nearly everything I've found online with no luck, including adding several of tunables I don't fully understand :P.

I'm using a CWWK mini PC with 4x 2.5G Intel NICs. Everything runs great, except that seemingly randomly, but usually every day or two, OPNsense completely locks up. I don't see anything in log files to indicate what is going on. The only part of any concern is the "Listen queue overflow", but I'm not sure that's relevant.

See below for today's occurrence. Everything was going fine  until 16:57 when it appears to have initiated a shutdown. Ethernet link lights stay on, but no activity. Can't SSH, can't ping, can't do anything until I hard reboot the machine, which comes back up and will run fine for another day or so.

I'm currently running 23.7.12_5, as 24.1 seemed to be less reliable. I'd love to figure this out and try updating again.

I'm pretty green when it comes to FreeBSD. Slightly more experienced with Debian Linux, but far from a pro.

2024-04-01T17:03:01-04:00 Notice kernel ---<<BOOT>>---
2024-04-01T17:03:01-04:00 Notice syslog-ng syslog-ng starting up; version='4.4.0'
2024-04-01T16:57:49-04:00 Notice syslog-ng syslog-ng shutting down; version='4.4.0'
2024-04-01T16:57:49-04:00 Notice kernel <118>>>> Invoking stop script 'config'
2024-04-01T16:57:49-04:00 Notice kernel <118>>>> Invoking backup script 'rrd'
2024-04-01T16:57:49-04:00 Notice kernel <118>>>> Invoking backup script 'netflow'
2024-04-01T16:57:49-04:00 Notice kernel <118>>>> Invoking backup script 'duid'
2024-04-01T16:57:49-04:00 Notice kernel <118>>>> Invoking backup script 'dhcpleases'
2024-04-01T16:57:49-04:00 Notice kernel <118>>>> Invoking backup script 'captiveportal'
2024-04-01T16:57:49-04:00 Notice kernel <118>>>> Invoking stop script 'backup'
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 41820.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping osudpbroadcastrelay.
2024-04-01T16:57:49-04:00 Notice kernel <118>osudpbroadcastrelay is running as pid 41820.
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 39089.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping osudpbroadcastrelay.
2024-04-01T16:57:49-04:00 Notice kernel <118>osudpbroadcastrelay is running as pid 39089.
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 35910.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping osudpbroadcastrelay.
2024-04-01T16:57:49-04:00 Notice kernel <118>osudpbroadcastrelay is running as pid 35910.
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 33009.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping osudpbroadcastrelay.
2024-04-01T16:57:49-04:00 Notice kernel <118>osudpbroadcastrelay is running as pid 33009.
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 30063.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping osudpbroadcastrelay.
2024-04-01T16:57:49-04:00 Notice kernel <118>osudpbroadcastrelay is running as pid 30063.
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 27167.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping osudpbroadcastrelay.
2024-04-01T16:57:49-04:00 Notice kernel <118>osudpbroadcastrelay is running as pid 27167.
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 41453.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping acme_http_challenge.
2024-04-01T16:57:49-04:00 Notice kernel <118>Waiting for PIDS: 50883 51561.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping flowd.
2024-04-01T16:57:49-04:00 Notice kernel <118>Stopping flowd_aggregate...done
2024-04-01T16:57:48-04:00 Notice kernel <118>>>> Invoking stop script 'freebsd'
2024-04-01T16:57:48-04:00 Notice kernel <118>>>> Invoking stop script 'beep'
2024-04-01T16:51:10-04:00 Notice dhclient Creating resolv.conf
2024-04-01T16:42:51-04:00 Notice kernel <7>sonewconn: pcb 0xfffff8004e4ee9b0 (0.0.0.0:2189 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (5 occurrences)
2024-04-01T16:41:41-04:00 Notice kernel <6>arp: 192.168.20.50 moved from f2:30:82:df:67:8c to 98:b7:85:01:d5:12 on vlan020
2024-04-01T16:21:41-04:00 Notice kernel <6>arp: 192.168.20.50 moved from f2:30:82:df:67:8c to 98:b7:85:01:d5:12 on vlan020
2024-04-01T16:01:43-04:00 Notice kernel <6>arp: 192.168.20.50 moved from f2:30:82:df:67:8c to 98:b7:85:01:d5:12 on vlan020
2024-04-01T15:51:09-04:00 Notice dhclient Creating resolv.conf

Did you check if the CPU microcode is current or installed the CPU microcode packages (there is a howto in the tutorial section)?

Early versions of CPU often have bugs (Alder Lake is known to have) and they may not have been fixed in your firmware yet.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Quote from: meyergru on April 02, 2024, 12:03:08 AM
Did you check if the CPU microcode is current or installed the CPU microcode packages (there is a howto in the tutorial section)?

Early versions of CPU often have bugs (Alder Lake is know to have) and they may not have been fixed in your firmware yet.

I have not. I will do that now. If this fixes it, I'm gonna feel real silly  ;D

Alright, that's done. Let's hope that resolves the issues! I should have specified, it is an N305 CPU, which I noticed you specifically called out in that guide as having bugs.

Thanks for your quick assistance. I'll report back if it happens again, or in a few days if it doesn't!

Appears that was not the fix :(

Just had another crash after installing microcode packages as indicated in the tutorial.

Did you order the machine as barebone? There have been Reports about incompatible RAM.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+


You could try a USB Stick with memtest86.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Well, it seems the issue was the included power adapter. It was a Daijing 60W unit, which I'd think should be plenty (never measured more than 40W draw fully loaded).

I replaced with a Mean Well 80W (GST90A12-P1M) I found relatively cheap on eBay and have not had any further issues. Approaching 5 days uptime now.

I'd read about others having PSU issues, but I guess I didn't suspect that as it wasn't fully shutting off/rebooting. Or I just didn't want to believe such a simple issue was my problem  ;D

Either way, I'm sure all the other adjustments I've made along this journey are still beneficial and I'm in a better position than when I started.

Quote from: meyergru on April 02, 2024, 12:03:08 AM
Did you check if the CPU microcode is current or installed the CPU microcode packages (there is a howto in the tutorial section)?
is that the howto you're referring to: https://forum.opnsense.org/index.php?topic=36139.msg179435


Quote from: shenaniganz on April 12, 2024, 04:55:26 PM
Well, it seems the issue was the included power adapter. It was a Daijing 60W unit, which I'd think should be plenty (never measured more than 40W draw fully loaded).

I replaced with a Mean Well 80W (GST90A12-P1M) I found relatively cheap on eBay and have not had any further issues. Approaching 5 days uptime now.

How is your experience since replacing the PSU? I have a Qotom that has been running since July 2021. Last few days it started to randomly shutdown. Logs are similar to yours that I'll suddenly see stop and backup scripts. I initially ruled out PSU, since I would expect abrupt power offs instead, and not something that has signs of graceful shutdowns.
Thank you for sharing your experience, I am now going to find a replacement PSU.

Since replacing the power supply, I've not had a single unexpected shutdown. I'm currently at 20 days uptime, and that's only because I lost power for ~12 hours and my UPS died lol.

Quote from: shenaniganz on September 11, 2024, 05:27:51 PM
Since replacing the power supply, I've not had a single unexpected shutdown. I'm currently at 20 days uptime, and that's only because I lost power for ~12 hours and my UPS died lol.

Just wanted to report that I have been having the same issue with my opnsense Instance running on a

https://www.amazon.de/gp/product/B0BZJDPSRP/ref=ox_sc_act_title_1?smid=A389CA45WU2PPG&psc=1

resulting in a power down about once a day.

Replacing it with a better PSU

https://www.amazon.de/dp/B001W3UYLY?ref=ppx_yo2ov_dt_b_fed_asin_title

appears to have solved the issue.

Thanks for the pointer.

After initial stable runs of I encounter the same error of once a day crashes/Shutdowns even with the new PSU..

And I cant figure out why..

It apperrs that Something is causing a shutdown signal.. judging by the logs.

Any idea where to look further?