Troubleshooting guidance

Started by mrThirsty, May 06, 2026, 11:56:30 AM

Previous topic - Next topic
Hi All,

I could use some guidance with troubleshooting a problem I am having currently with OpnSense. Randomly during the day, my entire network seems to freeze up for about 30 seconds to a couple of minutes. I am not sure if its just LAN or WAN but all devices both wired and wireless can't seem to do anything during this, I am leaning toward it being a WAN-related issue as on the odd occasion when the freeze up has been long enough, I am still able to log into the Admin portal on my router. It started about after about 6 months of OpnSense running as my network.

My network looks like the following:

Virgin Media Fibre modem/router (modem only mode) -> OpnSense (Protectli Vault FW4C (Intel J3710) with 2 2.5GB NIC) -> 1GB switch and Ubiquiti Amplify-HD wireless router (Bridge Mode)

I have determined the issue is my OpnSense router as I have removed it from my network and then ran each of the ISP modem and Amplify-HD as the router for a day each and during those two days I did not have any of the freezes. I have also taken the extreme move of completely wiping my router and just having it run as it comes out of the box, just as a DHCP server, no ZenArmour or OpenVPN etc. and I still get the freezing. No matter what configuration I run my network in, as soon as OpnSense is the router, the freezing happens.

My OpnSense is a pretty basic setup, it runs Dnsmasq for DHCP, ZenArmour, and then an OpenVPN connection with some routing rules to push specific network client's traffic over that connection.

I am at a loss at what to look at to try determine what is causing this freezing when OpnSense is the active router on my network, when looking at the logs I am not really sure what I am looking at so probably wouldn't spot the issue if it was staring at me. I do appreciate that this could also be a hardware issue as well.

I could use some guidance on what to look for or potentially even a fix to try as I don't want to get rid of OpnSense as I love the control it gives me, especially along with ZenArmour, about what my kids can access when and filtering out the dodgy stuff as best as possible, but I am getting to the point that the Wife Acceptance Factor is getting very low, so I need to resolve it otherwise I will be gaining an expensive paper weight in the protectli.

Appreciate any help, pointers, etc.

I would start by doing hardware tests. Memtest and then stress-test with traffic, with say iperf through it like to a public iperf server if you don't have the means to go across LAN - WAN in a lab.
Looking at logs is about looking for clues. Nobody can tell what to search for except some generic hints like greppin for errors.

Maybe IPTV related Multicast Storms because the ISP's Router has IGMPv3 on it's built-in Switch and your OPNsense and AmpliFi hardware doesn't ?!



#JustGuessing...
Weird guy who likes everything Linux and *BSD on PC/Laptop/Tablet/Mobile and funny little ARM based boards :)

Today at 01:46:04 AM #3 Last Edit: Today at 02:14:41 AM by drosophila
Quote from: mrThirsty on May 06, 2026, 11:56:30 AMall devices both wired and wireless can't seem to do anything during this
First thing to check: what do the lights do? Do switch / NIC indicate heavy traffic? No traffic at all? Normal traffic?
Second thing to check: can they communicate among themselves? IOW, on their segment / switch?
Quote from: mrThirsty on May 06, 2026, 11:56:30 AM, I am leaning toward it being a WAN-related issue as on the odd occasion when the freeze up has been long enough, I am still able to log into the Admin portal on my router.
So the LAN part is perfectly fine (unless you're using a dedicated admin port?), and the devices on your LAN can probably communicate normally among themselves. And the Protectli isn't frozen either, nor is the network stack. Keep the dashboard open and observe it for CPU / RAM / whatever spikes that shouldn't be there when the lockup happens. You probably need to add a couple of useful widgets first, like "CPU", "Traffic Graphs" and "Thermal Sensors".
Quote from: mrThirsty on May 06, 2026, 11:56:30 AMI have determined the issue is my OpnSense router as I have removed it from my network and then ran each of the ISP modem and Amplify-HD as the router for a day each and during those two days I did not have any of the freezes. I have also taken the extreme move of completely wiping my router and just having it run as it comes out of the box, just as a DHCP server, no ZenArmour or OpenVPN etc. and I still get the freezing. No matter what configuration I run my network in, as soon as OpnSense is the router, the freezing happens.
That rules out updates for ZA or other blocklists clogging up the machine. I'd look for WAN-related events like IP address changes, possibly interface-related if we assume that the WAN interface might simply have a defect. Would it be possible to reassign the WAN and one of the other interfaces (the box has 4 AFAICS) to see if the issue persists unchanged (so, on WAN) or sticks to the interface?

Also, you could observe the "Live view" on the Firewall. You don't need to interpret the individual lines, just look for changes in pattern (note that to do so, you'd need to stare at those logs while everything is normal for a while to be able to see what the normal patterns might be: not every "wall of red lines" indicates something unusual.). You might need to first enable logging for all rules first. Familiarize yourself with the settings and buttons on that page so you can hit the "stop" button in time, possibly increase the "Table size" to 100 or even more for that.

Since that is an Intel, you should install the "os-cpu-microcode-intel" plugin, even though this doesn't seem to be CPU related.

I would set up a machine that is normally on anyway (like your desktop / work machine) to do a continuous ping (not floodping, just a friendly once-per-second endless ping (/t in Windows, I believe)) to one of your other "always on" LAN devices (printer, TV, smartbulb, home automation, toaster, ...) in one terminal, and another such ping to something on the internet that won't be going anywhere, say, www.microsoft.com. Possibly a third terminal pinging the LAN interface of your Sensebox for good measure.

Just keep them running until the event and see which one starts failing / changes behavior, and how, once it occurs. Or even if at all: if it is a DNS issue, running pings won't be affected.