OPNsense Forum

English Forums => 24.7, 24.10 Legacy Series => Topic started by: bongo on November 23, 2024, 02:40:37 PM

Title: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on November 23, 2024, 02:40:37 PM
2 days ago, i updated to OPNsense 24.7.9_1-amd64.
since then, my internet connection stops working after about 12-24h of working fine.
i could not find out so far what's the reason. everything looks fine when i log in to OPNsense. but what i've seen is, that Interface/Diagnostics/DNSLookup does not work. it answers with a socket error then.
the restarting of unbound service did not solve the issue.
the only thing that seems to help is to reboot the firewall.
before i updated to the latest firmware, i never had such issues.
anyone else having this problem?
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on November 23, 2024, 05:38:17 PM
the exact message i get with
Interface/Diagnostics/DNSLookup
to www.google.ch with server set to 8.8.8.8 is
Error: error sending query: Error creating socket
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: fwne9 on November 25, 2024, 08:43:26 AM
Same here. I use the same version. The firewall keeps hang periodically but unpredictable. It was the same in the version before, but now it hangs more often.

Any suggestions where to look to find out the problem?

best regards
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: Amodin on November 25, 2024, 03:01:04 PM
Did you check the server list under System / Settings /General?

Modify or remove these servers and see what happens. It sounds like you might have some old entries there that it's trying to reach and creating the error.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: GuruLee on November 25, 2024, 05:02:40 PM
I'm experiencing issues similar to this where the WebGUI begins to hang and dashboard widgets keep failing and re-appearing.

I'm not sure if this is related to very large firewall aliases for geoIP or IPLists with over 58,000 entries... These were not causing any issues prior to upgrading to 24.7.x

I'm on OPNsense 24.7.9_1-amd64
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on November 25, 2024, 09:33:28 PM
referencing back to my original post:
when i log in to OPNsense from LAN network, everything looks fine and the GUI behaves as expected. the only point is, that there is no throughput at all on the uplink. this happens after 1-24h. the only way to get data through the uplink then is to do a reboot.
i'm actually checking behavior when using a different NIC. might be that it's a hardware issue and the onboard NIC is about to die.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: slackadelic on November 25, 2024, 10:26:29 PM
Ok maybe I'm not losing my mind.

I've seen the same errors, but can't remember when it started.  I tought at first it was my ISP dropping.

What I noticed is no more arp, no route... just out of the blue.   I have to down the interface and bring it back up, and it's fine.

Reboot of the firewall fixes it as well, and a power cycle of the ONT fixes it.  I don't think your card is going bad... I think something odd is definitely going on.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on November 26, 2024, 06:28:25 AM
sounds reasonable. maybe something weird in handling this specific brand of ethernet interface?
so therefore replacing it by a temporary solution by using an usb connected network interface runs stable now for almost 2 days.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: slackadelic on November 26, 2024, 04:49:29 PM
This is an Intel nick that's been running great for quite a few years.  Didn't have this particular issue back in the summer and folks are correct, about the last update is when I started noticing the issue.
I'm continuing to look at logs when it happens to see if I can sort out what is going on, but so far nothing stands out.

Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on November 26, 2024, 04:57:31 PM
according to ASRock datasheet, my mainboard has a  Realtek RTL8111E on.
Title: NEW FINDING on: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on November 30, 2024, 08:55:59 AM
THE PROBLEM IS BACK  :'(

after switching to a different interface for the uplink (connected on usb), OPNsense was running stable now for about 5 days. now the issue popped up again.
yesterday this showed up 5 or 6 times. suddenly there is no more traffic on the uplink.

when it happened again for the 1st time, i've seen that unboundDNS was down and i restarted it. after doing so, DHCPv4 server became red and i also restarted this, and everything was fine for about 2 hours.

but for the next 4 or 5 times when the uplink failed, the dashboard never showed anything special (besides that there was no traffic on the uplink).
i then tried to do some checks and diagnostics, only confirming that the uplink was down.
while doing so, it happened each time that OPNsense suddenly worked again. so i 1st thought that it automatically recovers after some time. so i did not touch anything for more than 1 hour when this happened the next time, but no recovery then  :-\

but then i came to something very special:
when OPNsense fails and i go to <OPNsenseIP>/ui/interfaces/overview, i see that the uplink is down.
then after about 10 seconds, i do a reload of exactly the same page, and the uplink is up and everything is working fine again.
i have no proof that this always recovers from the issue, but so far, i did this twice and it helped twice. so it seems to be some kind of reproducible.
so this makes me no longer believe that this is a hardware issue. it really looks like something's wrong with the firewall software.

is this forum read by the developpers of OPNsense? can i expect that an expert takes a look at this issue?
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on November 30, 2024, 01:06:53 PM
looks like this really helps when the uplink is down:

login and go to
<OPNsenseIP>/ui/interfaces/overview
-> shows that the uplink is down

reload the page
-> shows that the uplink is up

everything is working again, until the uplink fails next time
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on December 01, 2024, 09:27:31 AM
the procedure i mentioned above, i.e. to access to interface overview page twice, is required to recover the uplink when logged in to OPNsense as administrator.
when i log in as a normal user, it is sufficient to just log in, and as soon as i see the lobby/dashboard, the uplink works fine again.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: Huike on December 02, 2024, 04:46:22 AM
I'm having similar issue after the 24.7.9_1-amd64 update. The Unbound DNS resolver seems having issues. My wifi clients took more than 10 seconds to load a web page. Games can't connect due to DNS query time out. I switched back from 1.1.1.1 to my ISP's DNS servers but the same. Initially I thought it was my wifi AP so I changed to another one but the same. Ethernet wired devices are better but sometimes DNS time out happens too.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: quad on December 02, 2024, 08:49:40 PM
I'm seeing the same issue on an OPNsense DEC740 since I updated to OPNsense 24.7.9_1-amd64, twice to be specific.

My device still responds to SNMP while this is going on so I can somewhat see what is happening. CPU usage goes to 100%, which makes the firewall fail at tasks like DNS or after a while, even DHCP leases. Interestingly SNMP also reports disk IO dropping to a flat zero while this is going on, even over a full hour.

It almost seems like OPNsense is loosing its storage device and goes bananas until it's rebooted. Unfortunately I haven't been able to get logs as I was using in-memory logging, so the logs were lost on reboot. But based on the fact that SNMP reports that no disk I/O is happening, I suspect the logs would not be written to disks anyways.

The load average is through the skies with 100% CPU usage, and the CPU usage is mostly "system". The later spike in "user" in the CPU graph is when my DHCP leases also stopped working. Attached screenshots from NMS.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: Patrick M. Hausen on December 02, 2024, 09:01:51 PM
Log in via SSH and run "top" ...
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: quad on December 02, 2024, 09:22:01 PM
Quote from: Patrick M. Hausen on December 02, 2024, 09:01:51 PM
Log in via SSH and run "top" ...

At least in my case, my firewall does not respond to ssh or its web ui.

Rather it seems like it tries to but fails with a timeout eventually. Whatever my firewall is stuck doing seemingly clogs it up so hard i can't ssh into it.

If it happens to me again I will try to connect to it using serial.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: franco on December 02, 2024, 09:35:17 PM
Can we stop piling "I have the same issue" on a reporter that said he uses re0/ue0 and observes WAN link failures? How I know? Because we have a bug tracker.

https://github.com/opnsense/core/issues/8098


Cheers,
Franco
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: quad on December 02, 2024, 09:46:11 PM
Quote from: franco on December 02, 2024, 09:35:17 PM
Can we stop piling "I have the same issue" on a reporter that said he uses re0/ue0 and observes WAN link failures? How I know? Because we have a bug tracker.

https://github.com/opnsense/core/issues/8098


Cheers,
Franco

I apologize. I will attempt to capture logs and go to github instead if my issue reappears
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on December 03, 2024, 07:34:21 AM
i think i found the reason why my setup was running stable for 5 days, and then the issue popped up again and the uplink always failed after a few hours:
during these 5 days, i had my uplink connected to a switch that only supports 100M. afther this time, i was confident that everything is working fine again and i removed all the unneeded stuff, and the uplink was running at 1G again.
then i had the issue again.
i tried to force my uplink to 100M by OPNsense settings, but this does not help. now i added the switch again to get the link down to 100M, and it works stable for almost 2 days now.
the only strange thing is, that i did not have this issue before updating to the latest version of OPNsense.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: franco on December 03, 2024, 11:09:25 AM
I do believe the burst speed will kill the NIC driver causing it to drop out and lose the link. This has been the case for as far as I can remember for some. It circles back to discussing that the particular hardware is not a good fit in our case.


Cheers,
Franco
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on December 03, 2024, 09:21:15 PM
i plan to replace my uplink with either an intel 82571 or an i350 based NIC. can i expect that this will solve the issue?
thanx!
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: franco on December 03, 2024, 09:25:27 PM
em(4) driver should cover both devices and should be fine. Just for reference, what hardware does this run on?


Cheers,
Franco
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: GuruLee on December 03, 2024, 10:29:02 PM
I continue to experience issues where the dashboard items do not load and the WebGUI overall hangs until I reboot every few days or so.
My version is OPNsense 24.7.9_1-amd64

I cannot even get firmware status or check for updates when it is in this state, nor can I get past SSH login prompt. As though it does not like my password...

But after forceful reboot, all back to normal for a few days.

Is this the proper post or should I make a new post?
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: GuruLee on December 03, 2024, 11:01:51 PM
I went ahead and forced shutdown of Opnsense (running on Protectli J3710) and then updated to 24.7.10_1

However, the dashboard widget content failures are already starting again and this usually is the preamble to the WebGUI hangs and inability to login to SSH.
(https://cloud.lcsconsulting.biz/s/ZkLwrsmcqjkXkGb)

For the record, I do not have  re0/ue0 as Franco noted

Any suggestions?
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: gac on December 03, 2024, 11:49:19 PM
I have similar symptoms occasionally.

If you can, log in via SSH and run `pftop`. I have something running on my LAN (it's a Docker container but I still need to spend the time to narrow down which one) which seems to hold connections open, but only sometimes.

Earlier this evening, after the 24.7.10_1 update, I couldn't SSH into the box anymore but I happened to have a serial cable connected. I was able to run `pftop` and see that there were 15000+ states open. Shortly after that, the box kernal panicked, spewed a load of debug output via serial and then rebooted itself. After the reboot, I stopped the Docker daemon on my home NAS and the number of states is currently hovering around 2000 states. I'm going to start the Docker daemon again and see if the problem comes back - if it does then I need to figure out which container it is because all I can see on OPNsense is the source IP which just comes back as the NAS because of how Docker networking works by default.

In your case, it sounds like maybe you also have too many states open, so your box gets to the point that for some reason it can't accept new connections or the DDoS protection (syncookies) is coming into play, or something like that.

edit - I started the Docker daemon. Within a few seconds, `pftop` was showing 7500 states, so the amount tripled.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: newsense on December 03, 2024, 11:59:41 PM
Please start a new thread for your issue, it has nothing to do with this one
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: slackadelic on December 04, 2024, 07:17:08 AM
Quote from: slackadelic on November 26, 2024, 04:49:29 PM
This is an Intel nick that's been running great for quite a few years.  Didn't have this particular issue back in the summer and folks are correct, about the last update is when I started noticing the issue.
I'm continuing to look at logs when it happens to see if I can sort out what is going on, but so far nothing stands out.

After some more observations and testing, this issue that is discussed does not seem to apply to my Intel setup.  I'm pretty sure my ISP did something; not sure what but will keep an eye out if the issue persists.

So far, I'm stable. 
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on December 04, 2024, 07:59:50 AM
Quote from: franco on December 03, 2024, 09:25:27 PM
em(4) driver should cover both devices and should be fine. Just for reference, what hardware does this run on?


it's an asrock j3455m pc mainboard (with a realtek onboard nic which i used so far for the upling re0).
in each one of the 3 pcie slots, i have a nic used for one of the lans.
when i built the machine a few years ago, i took different lan cards for each of the slots to be prepared for tests once i run into issues with one of the cards.
unfortunately, all 3 cards are used for lans in the meantime, that's why i attached an usb nic for my actual tests.

i now plan to replace one of the cards with an intel dual nic. so i again get a spare nic.
Title: Re: OPNsense needs periodic reboot since updated to 24.7.9_1-amd64
Post by: bongo on February 11, 2025, 07:59:15 AM
i built a new hardware for OPNsense, based on the same mobo, but with intel NICs.

this now seems to run stable for several weeks.

in parallel, i also ran the old hardware with realtec NICs for a few days, but surprisingly, this one also kept its uplink up for the whole time.

when updating to 24.7.11, i realized that this did also some update on the re-driver-package, which was not mentioned in the release notes. so maybe the original issue with realtek NICs got solved there, and it would not have been required to replaced my hardware.