WAN interface flapping with 22.1.2

Started by foxmanb, March 03, 2022, 01:45:18 PM

Previous topic - Next topic
June 11, 2022, 03:01:20 AM #120 Last Edit: June 12, 2022, 09:30:50 PM by Davesworld
Quote from: atxx on May 13, 2022, 03:47:07 PM
I registered in order to note that I'm experiencing this issue on Broadcom BCM5720 (Microserver Gen8) therefore this is probably not Intel specific as speculated.

I do feel obliged to say that Opnsense has pretty much been smooth sailing up until now. Thank you for your good work!

This speaks volumes. It would suggest that it is not an Intel issue at all. Notice that LAN, WLAN etc don't cycle even if using  Intel nics? My WAN2 interface which I had to create from an OPT and rename it does not do it either.

Edit: Back to using the built in WAN, it was DNS overlaps between two WANS using the same DNS servers (Google) and two Gateways using one of each of the two Google DNS. I've decided I don't want any of my traffic going through Google. Considering making my own DNS servers and synchronizing them often.

June 11, 2022, 03:40:23 AM #121 Last Edit: June 12, 2022, 09:34:05 PM by Davesworld
I am curious, rc.newwanip was inherited from pfsense but it was changed in may 2022, last month of course. As far as I know, this is only used on the wan.

Edit: This wasn't related to the problem at all, it was DNS overlaps between two WANS using the same DNS servers (Google) and two Gateways using one of each of the two Google DNS. I've decided I don't want any of my traffic going through Google.

@Davesworld its possible it's not an intel only issue.. However installing the newer intel IGB drivers solved the issue for me and others. 

June 11, 2022, 03:56:34 AM #123 Last Edit: June 11, 2022, 04:00:25 AM by tracerrx
@davesworld  It looks like there have been a lot of changes to rc.newwanip since pfsense see: https://github.com/opnsense/core/commits/master/src/etc/rc.newwanip

edit.. Disregard..the above.. apparently i'm sleepy or cant read...

June 11, 2022, 04:16:59 AM #124 Last Edit: June 12, 2022, 09:32:59 PM by Davesworld
Quote from: tracerrx on June 11, 2022, 03:52:49 AM
@Davesworld its possible it's not an intel only issue.. However installing the newer intel IGB drivers solved the issue for me and others.

How long has it been stable? The reason I ask is I have more than one WAN and the other WANS do not cycle with the stock kernel intel drivers. Only the built in WAN that is preselected like LAN is, uses rc.newwanip. It's the only interface that uses it. Why is it even needed? When you add new WAN interfaces, they are OPT interfaces and one can name them WAN2 WAN3 etc. Since I have several static IPs, I added yet a third WAN and used one of my unused static IPs. My default WAN still goes up and down none of the other interfaces including lan, also using the stock intel driver, never go down. Someone is also having the same problem with a broadcom nic. I'm very surprised an out of kernel driver really fixed it for you and others if the others are as lucky as you.

When creating another WAN, one has to click on the block private networks and block bogon networks. ALL traffic is blocked by the firewall by default unless you add rules allow anything in so other than that rc.newwanip it behaves exactly as the built in WAN and doesn't cycle. 

Edit: No new WAN creation needed, I'm back to the built in wan, it was DNS overlaps between the two WANS using the same DNS servers (Google) and two Gateways using one of each of the two Google DNS. I've decided I don't want any of my traffic going through Google.

This is interesting... I will try moving my primary comcast (static IP) on my protectli WAN to one of the opt interfaces, revert the drivers and see if I get flapping... Most of the protectli devices use Intel IGB drivers...

June 11, 2022, 05:24:38 AM #126 Last Edit: June 12, 2022, 09:36:39 PM by Davesworld
Quote from: tracerrx on June 11, 2022, 04:26:02 AM
This is interesting... I will try moving my primary comcast (static IP) on my protectli WAN to one of the opt interfaces, revert the drivers and see if I get flapping... Most of the protectli devices use Intel IGB drivers...

I created an interface called WAN_ALT and changed my gateway to that interface and disabled the WAN interface. Since WAN_ALT is also on my fiber link the gateway had priority of 1 and since that priority gateway is now on WAN_ALT, it started routing over WAN_ALT as soon as I disabled WAN. Just rememember to block bogons and non routable ips reserved for lans. I'm watching the dynamic logs.

Edit: No new WAN creation needed, I'm back to the built in wan, it was DNS overlaps between the two WANS using the same DNS servers (Google) and two Gateways using one of each of the two Google DNS. I am no longer using Goggle's DNS servers at the moment.

June 11, 2022, 11:26:19 AM #127 Last Edit: June 12, 2022, 09:37:34 PM by Davesworld
The interface that is assigned to wan even though wan is disabled is still going up and down. It's on autopilot.

em0: link state changed to DOWN
em0: link state changed to UP
em0: link state changed to DOWN
em0: link state changed to UP
em0: link state changed to DOWN
em0: link state changed to UP
em0: link state changed to DOWN
em0: link state changed to UP
em0: link state changed to DOWN
em0: link state changed to UP
em0: link state changed to DOWN
em0: link state changed to UP

But it can't hurt me due to my WAN_ALT. Yep, there's a problem in a recent update

Edit: No new WAN creation needed, I'm back to the built in wan, it was DNS overlaps between the two WANS using the same DNS servers (Google) and two Gateways using one of each of the two Google DNS as I have stated in other edits here.

When it flaps, is it down for ~2 minutes?

June 11, 2022, 06:51:11 PM #129 Last Edit: June 12, 2022, 09:40:41 PM by Davesworld
Quote from: tracerrx on June 11, 2022, 03:27:32 PM
When it flaps, is it down for ~2 minutes?

Just about that, sometimes longer.

root@thor:~ # sysctl -a | grep -E 'dev.(igb|ix|em).*.iflib.driver_version:'
dev.em.0.iflib.driver_version: 7.6.1-k
dev.igb.6.iflib.driver_version: 7.6.1-k
dev.igb.5.iflib.driver_version: 7.6.1-k
dev.igb.4.iflib.driver_version: 7.6.1-k
dev.igb.3.iflib.driver_version: 7.6.1-k
dev.igb.2.iflib.driver_version: 7.6.1-k
dev.igb.1.iflib.driver_version: 7.6.1-k
dev.igb.0.iflib.driver_version: 7.6.1-k

This driver hadn't been changed recently. This is the stock kernel driver.

Edit: No new WAN creation needed and definitely not the driver, I'm back to the built in wan, it was DNS overlaps between the two WANS using the same DNS servers (Google) and two Gateways using one of each of the two Google DNS as I have stated in other edits here. This was bad but I got a wakeup call for doing that. I'm avoiding Google's DNS servers for now and maybe forever.


Quote from: tracerrx on June 11, 2022, 04:26:02 AM
This is interesting... I will try moving my primary comcast (static IP) on my protectli WAN to one of the opt interfaces, revert the drivers and see if I get flapping... Most of the protectli devices use Intel IGB drivers...

Most devices that are purpose built for a firewall/router use intel. I have never seen realtek.

June 11, 2022, 11:31:52 PM #131 Last Edit: June 12, 2022, 06:00:50 AM by Davesworld
My method of creating another wan is just a bandaid. Took Franco's advice about overlapping DNS entry. I had been using google DNS for gateway monitoring, and dns for each gateway as I have two wans. I made them all different. We'll see.

June 12, 2022, 05:55:47 AM #132 Last Edit: June 12, 2022, 09:24:34 PM by Davesworld
Quote from: franco on March 24, 2022, 07:11:38 PM
Well in any case you seem to have overlapping DNS servers for the different interfaces, either set manually, by ISP or gateway monitor. In some cases ISPs push Google servers which is pretty mean since it also pins a route for it through their interface.


Cheers,
Franco

Mine was flapping too, my primary gateway. Since I have two WANS, the instructions you told someone else as far as DNS being set to none doesn't sound right for multiple wans with their own gateways. Each gateway I have two DNS addresses. Before I was using both google dns addresses on each gateway and then using one google dns for monitoring one gateway and the second google dns to monitor the second gateway so now I am back to using the real wan after having added a third one and moving my cable to it temporarily. Also I have completely different primary and secondary dns settings for each gateway. No two match now. If I understood you correctly, we can't have DNS's that are the same for two gateways and then using one of each to monitor the gateway?  I just set my gateways to just use their respective ISP gateway to monitor for now. If I understand you correctly on another note, avoid google dns. How am I doing so far?

I did make a third WAN temporarily as I indicated above and moved my fiber(main internet connection) to it and it was rock solid. I am using the proper WAN again that is already in the distro with no dns overlaps and so far it's not flapping. I believe this is all documented somewhere but it's been a while. Am I correct in asserting that the the DNS entries should only be assigned gateways when there are more than one WAN? I never suspected the driver as others have in this thread as it's at least 4 years old and it didn't cause LAN and WLAN etc to flap if using all Intel nics and would not have just now started causing trouble. Sorry if too many questions at once.

Update, a day later and it's rock solid after removing the overlaps. I may have even went overboard but there are a lot of high quality DNS servers out there.

Quote from: tracerrx on June 11, 2022, 03:52:49 AM
@Davesworld its possible it's not an intel only issue.. However installing the newer intel IGB drivers solved the issue for me and others.

I have no way of knowing what other changes were made so I do not know if the driver could possibly have fixed their issue. If it was a driver issue you would also lose other igb interfaces which is clearly not the case. I have difficulty blaming a driver that we have used with rock hard stability since at least 2018. Out of kernel hardware drivers should be avoided if possible and it steals attention away from other issues.

If only one interface using the same hardware as the other interfaces is cycling, there is most certainly another issue beside the driver, I discovered mine, it was DNS overlap plain and simple and it was right in front of me the whole time. It should be noted that some PFsense users have the same issues from time to time, even years back, and it usually ends up being the same misconfiguration. If you use Google DNS for gateway monitors and dns, it's a recipe for more flapping than the 1920s flapper craze. The possibility that some ISPs are routing traffic through google without telling us certainly doesn't help.

The reason why adding an extra wan also solved my problem is because the gateways and the DNS entries were no longer pointed at a specific gateway which I discovered later while employing Franco's advice about DNS overlap.

It is certainly possible that a recent update caused the system to react to the misconfiguration that was ignored before.

I'm sticking with the in kernel drivers as much as possible.

I had this DNS overlap on 1 device originally, and fixing it definitely made the problem better, however i still got flapping on the wan every 2-3 days until I replaced the driver.  All of my primary WANs are Comcast (some residential DHCP  others business static), so it's possible that Comcast has made a change to their systems that's sending something funny.

Either way, shouldn't the Opnsense GUI prevent you from using overlapping DNS/Gateway Monitors to prevent this?  And why did this work on 21.x and not in 22.x if it's always been the case?