PLEASE HELP! WAN FLAPPING!

Started by SolarCzar, January 07, 2023, 04:45:27 PM

Previous topic - Next topic
January 13, 2023, 09:16:23 AM #15 Last Edit: January 13, 2023, 09:22:16 AM by scrensen
Quote from: SolarCzar on January 07, 2023, 04:45:27 PM
OK, I'm getting desperate.  Just when I think my changes based on recommendations are stable, my WAN port starts flapping again. 

Any errors/collisions on Interfaces -> Overview --> WAN ?

And did you add any parameters to Interfaces -> WAN --> DHCP client configuration (one of the tabs)? Is 'override MTU' ticked?

I had similar issues when I started with opnsense some years back and had to add this to the 'Option modifiers':

supersede interface-mtu 0

Which was for a specific ISP here in my country, but it might help in your case.

Update...[Solved? we'll see...]

So I've swapped cables, ports, power supplies, turned down services, disconnected Cisco C220 & Rpi servers, disconnected the LAN, disconnect Ruckus ZD1200 & AP's, etc...until there was nothing but the OPNSense N5105 and the AT&T BGW210-700 modem (and AT&T fiber ONT)...no joy.  Was still WAN Flapping.

So here's the crazy part, I'm about to pull the N5105 device out and reflash it when...In a Google search I found an obscure link to others having similar problems over on the pfSense forum, with much of the dialogue centered around the older i225 ports.  I have 4xIntel i226-V 2.5G ports on my Topton.  Some had indicated that the problem went away when they downgraded their link speed from 1000baseT to 100baseT.  Continuing the testing, they had "solved" it by putting a small unmanaged switch between the isp's modem (AT&T Arris BGW210-700 modem in my case) and the Topton N5105 WAN port. What the hell, I've got a small 6 port gigabit switch, let's try it...and what do you know...the WAN port stopped Flapping.  Gave it a few hours and reconnected the rest of the LAN elements slowly watching for the tell tale Critical Errors along the way, and it has not flapped in over 48hrs.  My calls on Friday (yesterday) on Microsoft Teams went smooth with no clipping.

So, later that Friday when I was finished in the work day (and when no one was home), I upgraded back to 22.7.10_2 for giggles.  No problems so far. As of now, all my LAN elements are back connected and my network appears stable.  If that changes, I'll report back.

I can't say I understand it and maybe one of you smart guys can, but all I can assume is there is something sensitive in the electrical aspects of the 1GbE & 2.5GbE ports.  Oh and the unmanaged switch is 1GbE ports for reference. 


QuoteAnd did you add any parameters to Interfaces -> WAN --> DHCP client configuration (one of the tabs)? Is 'override MTU' ticked?

I had similar issues when I started with opnsense some years back and had to add this to the 'Option modifiers':

supersede interface-mtu 0

Which was for a specific ISP here in my country, but it might help in your case.

So I posted before reading scrensen suggestion.  My MTU override is checked.  What does the "supersede interface-mtu 0" do exactly?  I'll read up.

Quote from: SolarCzar on January 14, 2023, 12:43:56 PM

So, later that Friday when I was finished in the work day (and when no one was home), I upgraded back to 22.7.10_2 for giggles.  No problems so far. As of now, all my LAN elements are back connected and my network appears stable.  If that changes, I'll report back.

I can't say I understand it and maybe one of you smart guys can, but all I can assume is there is something sensitive in the electrical aspects of the 1GbE & 2.5GbE ports.  Oh and the unmanaged switch is 1GbE ports for reference.

By adding the switch in between you isolated the problem - which still occurs imo - from the Topton. Setting the port speed to 1Gbps full duplex should have had the same result. So your FW was never the cause of the issue but at the receiving end of it - and reacting as expected whenever a link changes

The next step would be to call your ISP, explain the problem and ask them to fix it - or if you have access to that ISP managed device and can set the port manually on 1Gbps full duplex it should take care of the faulty auto-negotiation ---> which appears to be the root cause here.

So I've access to the BGW210 modem and had previously set the port to 1Gb full duplex.  The MTU setting is still in flight, as I'm also finding issue with other parts of my network.  The modem is set for MTU 1500, which is standard on many of the laptops, but my 10Gb Fiber connection from my C220 to my Aruba switch was set at MTU 9000 and it was causing havoc to Cloudflared Zero Trust tunnels.  I just set it to MTU 1400 and my tunnels are connecting.  Just a lot SysAdmin work that I did not want to learn and had hoped were more/less standardized.  But you are right, there is still a problem.