1
23.1 Legacy Series / Re: OPNSense freezing 3 times an hour
« on: May 11, 2023, 11:17:52 pm »It doesn't seem cpu overtaxed. I would check dmesg at the console when it happens.
We're looking for clues in that log buffer even if top doesn't report a spike, maybe some errors.
Your diagnosing seems to suggest the problem could be downstream from the firewall. What I would do after restarting the switch just in case is diagnose at both ends in parallel. Wired client and firewall. We want to eliminate wireless from the equation for now.
Start with dmesg and top at the firewall. Network diagnostics from the client: ping, nslookup, etc.
And I would reconfigure it without AdGuard too, to eliminate name resolution blocks. That wouldn't explain a network freeze at the client as you know.
That said, when you say OPN freezes, can you describe where (a particular settings page), or something else? I'm thinking that from the diagnostic so far, if say the network stutters (let's say the switch drops packets) from the client then it would look like OPN is frozen but is just the link to it that is. Thinking aloud here.
I think you're spot on in narrowing down the issue to downstream on the LAN. All my devices, including hardwired to the switch just below OPNsense were losing all internet access, including ping to 8.8.8.8 (to rule out DNS). I tried replacing the main switch downstream from OPNsense from a 1G switch to 2.5G switch (planned upgrade anyway as part of this project), but the results didn't change. One thing I did notice is that an old AiMesh node I was using as a temporary switch while waiting for one for 5 port switch, not configured as part of the new AiMesh AP configuration, started turning lights on and off and blinking rapidly when my clients would lose internet connectivity during the minute of downtime. I replaced that old AiMesh node (not part of my newer AiMesh wireless AP system) and the connectivity issues have not happened yet in the hour since I did that. I think that may have been the issue after everything.
I've attached a better diagram of the network topology at play here BEFORE I replaced that old AiMesh "switch". The current topology replaces that OLD AiMesh "switch" with an actual unmanaged switch. Everything seems to be working now. Will of course update if the issue re-emerges, but I think we're in the clear now. Thank you all your time and help looking into this.