primary WAN connection randomly dropping

Started by opn69a, February 05, 2023, 01:41:51 AM

Previous topic - Next topic
February 05, 2023, 01:41:51 AM Last Edit: February 05, 2023, 06:53:37 AM by opn69a
Hi,

I recently updated to the latest version. I've been getting random drops in my network connectivity. I host a few servers and some people connected to them have mentioned getting disconnected from them at random times.

When looking at the logs, I noticed this popped up every time it happened:


Feb  4 14:53:08 192.168.0.1 devd[564]: Processing event '!system=IFNET subsystem=igb0 type=LINK_DOWN'
Feb  4 14:53:08 192.168.0.1 devd[564]: Executing '/usr/local/sbin/configctl interface linkup stop $'igb0''
Feb  4 14:53:08 192.168.0.1 configd.py[231]: [f7b48385-6596-427d-96d3-1b7fe985f0fd] Linkup stopping igb0
Feb  4 14:53:08 192.168.0.1 opnsense[58884]: /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for wan(igb0)


I also noticed dpinger failing:
2023-02-03T19:16:59-06:00 Warning dpinger WAN_GWv4 [[FILTERED IP]]: Alarm latency 15256us stddev 18079us loss 25%
2023-02-03T19:16:54-06:00 Warning dpinger send_interval 1000ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr [[FILTERED IP]] bind_addr [[FILTERED IP]] identifier "WAN_GWv4 "
2023-02-03T19:16:54-06:00 Warning dpinger exiting on signal 15


I see no other logs before or after this indicating an issue with anything. Is this a known issue w/ the latest OPNSense? Is there a way I can make these logs more verbose (or at least the ones for devd and configd in particular)?

Or, due to seeing the second set of logs above, could this be an issue with the downstream ISP? Really confused on what to troubleshoot here... Any guidance would be appreciated!

EDIT:
I attempted to go from 23.1_6 to 23.1 (as someone else mentioned in a thread that it fixed their problem) - that didn't appear to resolve the error in my logs from showing up. I did upgrade from 22.7 to 23.1 just a day or two before the 23.1_6 update went through. Looking at the log history in OPNSense, I see my upgrade was at this time:

2023-01-29T13:22:38-06:00 Notice pkg-static opnsense upgraded: 22.7.11_1 -> 23.1

I see the errors in the gateway logs a few times after the upgrade, but then nothing for the next couple of days. I then see:

2023-01-31T03:04:06-06:00 Notice pkg-static opnsense upgraded: 23.1 -> 23.1_6

But no logs for the rest of 01/31. Then starting on the morning of 02/02  (happened in about 16 different instances). 02/03 was about 4 different instances, and then today 02/04 so far 8 instances.

I'm going to see if I can find other logs on my network that are related to all the timestamps I've collected to see if something's related. I'm also going to set "Disable Gateway Monitoring" just so it _assumes_ it's up (I know this won't do anything if there's a real internet problem, but if it's just the gateway having issues with icmp, thought maybe it'd help idk).

If anyone has advice before I report back an edit (or double post) please chime in :)

February 06, 2023, 05:54:13 AM #1 Last Edit: February 08, 2023, 12:13:56 AM by opn69a
Latest updates:
- Found nothing on the network that indicated patterns or things that happened each outage
- I moved the WAN cable from igb0 to igb3 and created a new interface called WAN_TEST. Copied over all my rules, updated settings to include that, etc. and got internet working through there. Although the UI claimed it was 'down', I had Internet access and all that just fine. At first, I thought this resolved it, but then I had about 20 seconds of ping failure (with no logs, interestingly).

So with all the above, I've kind of come down to 4 possibilities:
- 22.7 -> 23.1 broke something (least likely since no logs until 23.1_6 version, but reverting didn't fix my issue)
- WAN cat5e cable the ISP has going to my firewall is shorted somewhere (power outage?)
- ONT outside the house for the ISP is shorted somewhere (power outage?)
- ISP has some noise or issues going on on the entire area and it's not just me, but no one's complained or noticed yet

Guess my next step is to call the ISP to see if I can get someone to come check the lines for noise, perhaps replace the ONT, and then go from there. I'll report back with whether I get a resolution via the ISP or not. Otherwise, if anyone else is having the same issues since going from 22.7 to 23.1, please respond here before I pull out my hair lol :)

EDIT:
rather than triple post...

Turns out it was the ISP. I didn't even have to call them. My internet 'magically' started working all day yesterday and (so far, knock on wood) all day today. Guessing they had some kind of noise or problem w/ a line down somewhere. Talk about a waste of time on my end LOL