OPNsense needs periodic reboot since updated to 24.7.9_1-amd64

Started by bongo, November 23, 2024, 02:40:37 PM

Previous topic - Next topic
2 days ago, i updated to OPNsense 24.7.9_1-amd64.
since then, my internet connection stops working after about 12-24h of working fine.
i could not find out so far what's the reason. everything looks fine when i log in to OPNsense. but what i've seen is, that Interface/Diagnostics/DNSLookup does not work. it answers with a socket error then.
the restarting of unbound service did not solve the issue.
the only thing that seems to help is to reboot the firewall.
before i updated to the latest firmware, i never had such issues.
anyone else having this problem?

the exact message i get with
Interface/Diagnostics/DNSLookup
to www.google.ch with server set to 8.8.8.8 is
Error: error sending query: Error creating socket

Same here. I use the same version. The firewall keeps hang periodically but unpredictable. It was the same in the version before, but now it hangs more often.

Any suggestions where to look to find out the problem?

best regards

Did you check the server list under System / Settings /General?

Modify or remove these servers and see what happens. It sounds like you might have some old entries there that it's trying to reach and creating the error.

I'm experiencing issues similar to this where the WebGUI begins to hang and dashboard widgets keep failing and re-appearing.

I'm not sure if this is related to very large firewall aliases for geoIP or IPLists with over 58,000 entries... These were not causing any issues prior to upgrading to 24.7.x

I'm on OPNsense 24.7.9_1-amd64

referencing back to my original post:
when i log in to OPNsense from LAN network, everything looks fine and the GUI behaves as expected. the only point is, that there is no throughput at all on the uplink. this happens after 1-24h. the only way to get data through the uplink then is to do a reboot.
i'm actually checking behavior when using a different NIC. might be that it's a hardware issue and the onboard NIC is about to die.

Ok maybe I'm not losing my mind.

I've seen the same errors, but can't remember when it started.  I tought at first it was my ISP dropping.

What I noticed is no more arp, no route... just out of the blue.   I have to down the interface and bring it back up, and it's fine.

Reboot of the firewall fixes it as well, and a power cycle of the ONT fixes it.  I don't think your card is going bad... I think something odd is definitely going on.

sounds reasonable. maybe something weird in handling this specific brand of ethernet interface?
so therefore replacing it by a temporary solution by using an usb connected network interface runs stable now for almost 2 days.

This is an Intel nick that's been running great for quite a few years.  Didn't have this particular issue back in the summer and folks are correct, about the last update is when I started noticing the issue.
I'm continuing to look at logs when it happens to see if I can sort out what is going on, but so far nothing stands out.


according to ASRock datasheet, my mainboard has a  Realtek RTL8111E on.

THE PROBLEM IS BACK  :'(

after switching to a different interface for the uplink (connected on usb), OPNsense was running stable now for about 5 days. now the issue popped up again.
yesterday this showed up 5 or 6 times. suddenly there is no more traffic on the uplink.

when it happened again for the 1st time, i've seen that unboundDNS was down and i restarted it. after doing so, DHCPv4 server became red and i also restarted this, and everything was fine for about 2 hours.

but for the next 4 or 5 times when the uplink failed, the dashboard never showed anything special (besides that there was no traffic on the uplink).
i then tried to do some checks and diagnostics, only confirming that the uplink was down.
while doing so, it happened each time that OPNsense suddenly worked again. so i 1st thought that it automatically recovers after some time. so i did not touch anything for more than 1 hour when this happened the next time, but no recovery then  :-\

but then i came to something very special:
when OPNsense fails and i go to <OPNsenseIP>/ui/interfaces/overview, i see that the uplink is down.
then after about 10 seconds, i do a reload of exactly the same page, and the uplink is up and everything is working fine again.
i have no proof that this always recovers from the issue, but so far, i did this twice and it helped twice. so it seems to be some kind of reproducible.
so this makes me no longer believe that this is a hardware issue. it really looks like something's wrong with the firewall software.

is this forum read by the developpers of OPNsense? can i expect that an expert takes a look at this issue?

looks like this really helps when the uplink is down:

login and go to
<OPNsenseIP>/ui/interfaces/overview
-> shows that the uplink is down

reload the page
-> shows that the uplink is up

everything is working again, until the uplink fails next time

the procedure i mentioned above, i.e. to access to interface overview page twice, is required to recover the uplink when logged in to OPNsense as administrator.
when i log in as a normal user, it is sufficient to just log in, and as soon as i see the lobby/dashboard, the uplink works fine again.

I'm having similar issue after the 24.7.9_1-amd64 update. The Unbound DNS resolver seems having issues. My wifi clients took more than 10 seconds to load a web page. Games can't connect due to DNS query time out. I switched back from 1.1.1.1 to my ISP's DNS servers but the same. Initially I thought it was my wifi AP so I changed to another one but the same. Ethernet wired devices are better but sometimes DNS time out happens too.

I'm seeing the same issue on an OPNsense DEC740 since I updated to OPNsense 24.7.9_1-amd64, twice to be specific.

My device still responds to SNMP while this is going on so I can somewhat see what is happening. CPU usage goes to 100%, which makes the firewall fail at tasks like DNS or after a while, even DHCP leases. Interestingly SNMP also reports disk IO dropping to a flat zero while this is going on, even over a full hour.

It almost seems like OPNsense is loosing its storage device and goes bananas until it's rebooted. Unfortunately I haven't been able to get logs as I was using in-memory logging, so the logs were lost on reboot. But based on the fact that SNMP reports that no disk I/O is happening, I suspect the logs would not be written to disks anyways.

The load average is through the skies with 100% CPU usage, and the CPU usage is mostly "system". The later spike in "user" in the CPU graph is when my DHCP leases also stopped working. Attached screenshots from NMS.