[update] after 22.7.9 update the gateway suddenly dies after 1 day or so

Started by manilx, December 03, 2022, 11:19:45 PM

Previous topic - Next topic
I have been running 22.7.8 (and all before) without issues for 9 months. Did a lot of version upgrades. Even from 22.6 to 22.7

Now with the last update I did on Friday I find that after 24hrs or so the internet dies and I see the gateway status "offline". First I thought it was the ISP router or cables. Lost 1 hour checking all. BUT the issue was OPNsense because after a reboot all OK again.
Now again after 24hrs out of the blue the same happened again.

These seems to be an issue with 22.7.9 which I'm at a loss to explain or check.

I have reverted to 22.7.8 (running on proxmox) and did a fresh upgrade again.
Let's see.


Hi

I don't think this is a problem with Unbound.

The Gateway turned "red" and my router couldn't be pinged.

I also rebooted after the upgrade. And also after the problem appeared the 1st time and it reappeared.

Must be something different!

https://www.reddit.com/r/opnsense/comments/zbt3il/after_2279_update_the_gateway_suddenly_dies_after/

Posted here to. And there seems to be an issue with the last update!

Suricata may be to blame. I didn't check if it was running when the gateway was lost....

I had the same issue here, firewall (Shuttle DH270) suddenly unreachable after update to 22.7.9 after ~1 hour.
Not pingable on either interface, but could be rebooted with power button (no monitor, keyboard attached)
After reboot, everything worked normally, but froze again after ~30min

Reverted the kernel -> no success
Reverted base, opnsense and suricata -> stable for 24+ hours

Strongly suspect suricata, but there is nothing to be found in the logs

It has to do with Suricata.

As soon as I put a lot of load on my opnsense 22.7.9 box, the interface which I use starts to stop responding to pings, etc. I have another interface on my opnsense box and that is still working. When I restart the suricata service, the ping replies start working again.

I have tried this by using a few speediest-cli calls in parallel and that will bring the relevant interface to a halt.

This problem is definitely related to Suricata after Opnsense 22.7.9 upgrade.  I can freeze my Opnsense box (all interfaces drop offline, web GUI freezes and requires hardware reboot with power button), immediately after saturating the line (1Gbps lines) with nzbget (TLS/SSL).  No issues with Suricata service stopped.  Suricata is configured in promiscuous mode, ips enabled, monitoring LAN interface.  This configuration has worked flawlessly for at least a year, previous to this upgrade.


Our issues are the same. Im running Suricata as well. For some reason on mine, unbound is the first victim and so thats what I was focused on.
I ran a speedtest (I have 250/250) and got 250 down, up 85, and it actually quit before it finished the upload test. I restarted suricata and unbound and everything is working again.

Hope @franco or someone is reading this ans already fixing......

Quote from: manilx on December 04, 2022, 11:28:08 PM
Hope @franco or someone is reading this ans already fixing......
Shouldn't that be up to the Suricata folks to fix? And best reported in the appropriate subforum?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

As I don't know if this is the problem, simple user here, I hope someone with more experience will take this up....

Quote from: manilx on December 04, 2022, 11:49:52 PM
As I don't know if this is the problem, simple user here, I hope someone with more experience will take this up....
Can you try to get digging in logs for clues?. If indeed there's a problem with Suricata, which this thread has no proof of yet, as pmhausen wrote, it is not for the OPN devs to try to reproduce your setup by guessing.

Looks like on reddit the problem was solved by not updating suricata or by having to restart it

I have upgraded again but blocked suricata from doing so. Waiting for the issue to not appear again. If it does I can look at the logs (which ones).

Can confirm it's Suricata 6.0.9.  Have spent many hours the last two days testing numerous settings and scenarios.

Reverted to Suricata 6.0.8 on Opnsense 22.7.9 and the problem stopped.  The logs did not show anything other than this: "/usr/local/etc/rc.linkup: DEVD: Ethernet detached event for dynamic wan(em0)" Each time it happened.  Problem was easily reproduced with nzbget (will saturate download pipeline; seems to be related to multiple, parallel high bandwidth connections occurring simultaneously; saw no unusual problems during normal daily network activity, so I'm sure most users will not notice anything amiss).  Dropped the entire network within seconds.

Protectli box, intel NIC, i5, 16GB dual channel.  Suricata running IPS, Promiscuous, on LAN.  Platform and config have been rock solid until this upgrade.

And, there are no hardware problems with the NIC, cable, ISP modem or switch.