A Hiccup of Unknown Origin

Started by luxgalactic, January 22, 2025, 12:46:57 AM

Previous topic - Next topic
Hello everyone. A couple of weeks back, I started having this issue where my connection to work over a VPN and then through an RDP would pause for about 10 - 30 seconds. I believed originally it was just that connection (interface) but as I have been dealing with this, I found that my entire internet connection pauses / hiccups.

It seems like I may have done an update to OPNSense the night before I started having the issues. Overall, nothing new about my network except that I blocked internet access to some IoT devices on my ASUS Router that same weekend.

One Sunday, I was on the couch scrolling YouTube and the feed would stop. I'd go to Reddit and same thing, nothing would come up. Then after a few seconds it would be back. On Monday, when the hiccup would happen while working, I'd grab my phone and try to check YouTube to see if it was working and it wasn't. It's like my entire OPNSense box just freezes.

The OPNSense Box is a Neosmay N100 firewall appliance. One of those $250ish Amazon buys. It has 32GB of RAM and something like 100GB of diskspace.

OPNSense is running as Transparent Filtering Bridge. It sits behind my ISP Modem and in front of my ASUS Router.

My work notebook is connected to 1 of the OPNSense ports, on and on a different subnet. It's traffic is passed through the MGMNT interface. (as been this way for about 2 months).
MGMNT interface connected to ASUS.
IN interface connected to ASUS.
OUT interface connected to ISP Modem.

BRIDGE interface consists of IN and OUT.

Versions:
OPNSense 24.7.12-amd64
FreeBSD 14.1-RELEASE-p6
OpenSSL 3.0.15

Plugins:
os-amce-client v4.7 (not enabled)
os-clamav v1.8_2
os-crowdsec v1.0.8_1
os-sensei v1.18.4 (Zenarmor not currently enabled)
os-sensei-agent v1.18.4 (Zenarmor not currently enabled)
os-sense-updater v1.17 (Zenarmor not currently enabled)
os-sunnyvalley v1.4_3 (Zenarmor not currently enabled)
os-theme-rebellion v1.9.2

To start, I am really just trying to figure out where to look for issues to try and better understand what might be going on. A recent assumption was that it was Zenarmor that was causing the issue. I don't think that is the case because as noted above, it is not enabled and I am still having the issues ... although maybe less frequent?! One reason I thought that it was Zenarmor was because I would find that the Cloud Nodes would show DOWN when the problem would present.

Another thought I had was that it was somehow related to CrowdSec. I'd see a lot of Block Activity on the firewall log around the same time that things would pause. It didn't seem to always be the case. I also had on occasion where CrowdSec would go down and it could not be restarted. I'd get a 'jwt' error and I would need to restart OPNSense completely in order for things to come back up.

Sometimes, things would not come back up completely. I'd need to unplug the ethernet from OPNSense to the ISP Modem and plug it into the ASUS router in order for OPNSense to respond. Almost like a DNS issue.

Speaking DNS, I do run in an LXC, PiHole and Unbound. Originally, I pointed OPNSense (System > Settings > General) at the ASUS router IP for DNS and as a second entry the PiHole IP. Last night I put QUAD9 in the first position and I have had issues today as well.

I'm sure there is likely some detail that I missed that might help, but again, what I am trying to get help with is ideas for where I should be looking to try to see what is going on. I've stared at the CPU graph looking for spikes (didn't see any), watch for memory issues (didn't see any), but I don't know what else to be looking for nor where.

Yesterday, I did turn on Suricata since I turned Zenarmor off. It was on the OUT interface opposed to Bridge. Probably did that wrong!

I had hoped the OPNSense update over the weekend would have solved things, but it didn't.

Obviously, the ASUS could be the issue, but it's been solid for the time I have had it. Recently I have been having Wi-Fi issues. Trying to use FaceTime lately has not been fun. There are no firmware updates for it at this time. I don't have any alerts for there being any issues but doesn't mean there aren't.

Additional details from my first post here: https://forum.opnsense.org/index.php?topic=44382.0

Appreciate the help in advance. Let me know if you need any additional details to better assist.



Quote from: luxgalactic on January 22, 2025, 12:46:57 AM[...]
BRIDGE interface consists of IN and OUT.
[...]

The first thing I'd check is your ARP table - an unfriendly proxy can cause intermittent outages on ARP requests (so periodic). A crude fix (what I ended up with) is static ARP entries.

Linux seems to be largely immune to ARP conflicts. One of these days I may dig into ARP behavior and/or code.

January 22, 2025, 01:16:15 AM #2 Last Edit: January 22, 2025, 01:35:11 AM by luxgalactic
Quote from: pfry on January 22, 2025, 01:07:34 AMThe first thing I'd check is your ARP table

There are only 9 entries, not sure what I would have expected to see, but since the ASUS is the router maybe that makes sense. Nothing about the entries seem off. Appreciate the suggestion.

*EDIT*
Maybe I spoke too soon. I was watching the ARP Table while I just had an issue and I noticed that the MAC address of my ASUS router (gateway IP) changed. I have a second ASUS router in the garage and it's MAC address was showing as the gateway IP. I am going to go disconnect it and see what comes of that.

@pfry Thank you. That has appeared to fix my issues as far as I have been able to tell so far. I'm not seeing the things I was seeing previously.

Can you enlighten me on how you thought to look at the ARP Table? Just curious about how you came to reason I might find an issue there.

Quote from: luxgalactic on January 22, 2025, 05:59:33 PM[...]
Can you enlighten me on how you thought to look at the ARP Table? Just curious about how you came to reason I might find an issue there.

My FiOS Internet service is delivered via Ethernet from the ONT, effectively as a bridged interface, and it's an unlimited ARP proxy. Bloody annoying. So it's the first thing I suspect when dealing with periodic connectivity loss on a bridge.

In your case, if it's equipment you control, you may be able to disable that behavior. Depends on the device.