WAN interface flapping with 22.1.2

Started by foxmanb, March 03, 2022, 01:45:18 PM

Previous topic - Next topic
My contribution to a game of "lets spot the common factor".

Current version: OPNsense 22.1.1_3-amd64 (reverted as far back as I could)
Interfaces: Intel I211 (x6)
Possible influencing configuration(s) when first encountered:
    - MAC Spoofing: No
    - IDS (+IPS): Yes (W/LAN only)

Other potential commonalities:
    - wireguard-kmod (4 in use)
    - Dual WAN w/ cellular hotspot (this was setup since WAN flapping arose. the usage bill resulting from WAN going down frequently absolutely sucks.)

Attempted resolution:
    - Disable IDS (+IPS): down/up continues.
    - Revert version to earliest 22.1 Production: down/up continues.
        https://wiki.opnsense.org/manual/opnsense_tools.html#example
    - Compile / install latest IGB drivers from Intel: down/up continues.
        https://forum.opnsense.org/index.php?topic=27299.msg137350#msg137350
    - Confirming no "DNS overlap": none exists.
        https://forum.opnsense.org/index.php?topic=27299.msg139635#msg139635
     
Next: Revert to 21.7.8  development version (22.1 w/o FBSD 13 kernel)
        https://forum.opnsense.org/index.php?topic=27299.msg139876#msg139876
       

Site 1
Current version: OPNsense 22.1.8_1-amd64 / FreebSD 13
Interfaces: Intel I210 (x6)
Device Type: FW6D - 6 Port Intel i5 (8250U)
MAC Spoofing: NO
IDS: LAN
IPS: NO
Wiregaurd: GO on primary WAN
OpenVPN: NO
IPSEC: Yes, multiple site-site tunnels on primary wan
Multi WAN: Yes (Comcast-static/Starlink-dhcp/Hughesnet-dhcp)
Plugins: os-api-backup, os-ddclient, os-mdns-repeater, os-speedtest-community, os-wireguard, os-wol
Verified No DNS/Gateway Overlaps: YES
Updated Drivers Resolves Issue?:  Mostly

-----------------------------------------

Site 2
Current version: OPNsense 22.1.8_1-amd64 / FreebSD 13
Interfaces: Intel I210 (x4)
Device Type: FW4B - 4 Port Intel J3160
MAC Spoofing: NO
IDS: LAN
IPS: NO
Wiregaurd: GO on primary WAN
OpenVPN: NO
IPSEC: Yes, single site-site tunnel on primary wan
Multi WAN: Yes (Comcast-dhcp/Starlink-dhcp)
Plugins: os-api-backup, os-ddclient, os-mdns-repeater, os-speedtest-community, os-wireguard, os-wol
Verified No DNS/Gateway Overlaps: YES
Updated Drivers Resolves Issue?:  Mostly

Add me to the list.  My router has 6x* Intel I211-AT  with the latest updates.

I rebuilt it from scratch and did the bare minimum setup. Everything was good with the first few devices added to the switch but then it went bezerk  when I added everything else. It especially didn't like my Intel NUC.

My 5 port Intel I225-V box with the latest opnsense updates does not have this issue.

July 09, 2022, 10:15:18 AM #153 Last Edit: July 10, 2022, 10:53:37 PM by stefan21
Running on hardware:

# sysctl -a | grep -E 'dev.(igb|ix|em|bg).*.%desc:'
dev.em.0.%desc: Intel(R) Legacy PRO/1000 MT 82540EM
dev.bge.0.%desc: Broadcom NetLink Gigabit Ethernet Controller, ASIC rev. 0x5784100

Had to revert back to

OPNsense 21.7.8-amd64
FreeBSD 12.1-RELEASE-p22-HBSD
OpenSSL 1.1.1m 14 Dec 2021

Changed the intel nic to LAN (before it was on the WAN)
Seems to run stable as before.

In the 21.7.8 system log:

kernel   em0: link state changed to DOWN
opnsense[81347]   /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.x.x.x ::)
kernel      em0: link state changed to UP
opnsense[34693]   /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.x.x.x ::)
opnsense[50625]   /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'em0'
opnsense[50625]   /usr/local/etc/rc.newwanip: On (IP address: 192.x.x.x) (interface: LAN[lan]) (real interface: em0).

The sense ist sitting behind a Vodafone cable modem. Configured as exposed host. The cable modem should still run in bridged mode. There have been updates from vodafone, therefore I'm not quite sure if there's something messed up in the config. I'll check this on monday.

I'm running another sense (also Hardware) with the latest version. No intel nics, mainly the same configuration. No problem with this machine. Does not loose WAN nor wireguard.

regrads,
stefan

Edit: both machines are configured with IDS and IPS on WAN and ZENARMOR on LAN
Edit: NO mac spoofing

July 10, 2022, 01:44:24 PM #154 Last Edit: July 25, 2022, 09:52:40 AM by crissi
Hi,

Current Version: OPNsense 22.1.10-amd64 (on BareMetal)
Interface: Intel I211
Suricata / Sensei: deactivated
Gateway Monitoring: disabled
MAC Spoofing: Yes

As soon is i add the spoofed MAC to the WAN Interface, the flipping starts (up / down every 2-3 seconds).


2022-07-10T13:32:54 Error opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for dynamic wan(igb0)
2022-07-10T13:32:53 Error opnsense /usr/local/etc/rc.linkup: Clearing states for stale wan route on igb0
2022-07-10T13:32:53 Critical dhclient exiting.
2022-07-10T13:32:53 Error dhclient connection closed


When I remove the spoofed MAC Address, all is fine and connection is stable.



UPDATE:

Installed now for futher testing my 3 year old Realtek USB GbE Ethernet Adapter, and configured new WAN Port,
and added there my spoofed MAC Address.

Result: No flapping, all fine, connection is stable!

Next i enabled Intrusion Detection IDS / IPS and Promiscuous mode.

Result: No flapping, all fine, connection still stable!


As it seems to be Driver Related as the old Realtek is working without any issue, why is the Intel 211 Driver not working as it should be a Standard Driver? Is FreeBSD using old Drivers?

What to do?
Cheers,
Crissi

Quote from: stefan21 on July 09, 2022, 10:15:18 AM
The sense ist sitting behind a Vodafone cable modem. Configured as exposed host. The cable modem should still run in bridged mode. There have been updates from vodafone, therefore I'm not quite sure if there's something messed up in the config. I'll check this on monday.

On monday morning the interface with the intel nic didn't work any longer. I removed the nic and put a realtek in the machine. Interestingly no error shows up in any log which may lead to a defunct nic.

The last vodafone update messed up my configuration. It's a Fritz!Box 6591 Cable BH. Exposed host was disabled, realtime prio was disabled and also self port opening. I changed the settings back.

I'll report if the errors are gone.

regards,
stefan


July 12, 2022, 08:15:55 AM #156 Last Edit: July 12, 2022, 09:00:25 AM by iMx
Been seeing the same on 22.1.10, somehow it seems worse since I upgraded to .10 - although I've certainly been fighting this for a while - but might be a coincidence.

WAN interface: Intel(R) Ethernet Controller X710 for 10GbE SFP+

- Disabled MAC spoofing, did not fix things
- Disabled gateway monitoring, seemed to improve things but did not resolve
- Upgraded the 710 firmware, using stock drivers, did not fix things
- This morning, I have now loaded the Intel updated drivers (1.12.35)

I am also using RSS, so this is maybe the next thing for me to rule out.

I do also note, that the latest Intel drivers are not iflib, so the various tunings have now changed (tx/rx ring buffer, queues).  There are details in the readme.txt.

I did also try the updated IGB driver (out of curiosity) as I also have the below, although these ports are NOT on the WAN and did not see the problem.

Intel(R) I211 (Copper)

But this lead to 'weird' things.  For example, the HAproxy instance running on opnsense could not health check my Home Assistant server (to provide SSL externally) TCP port.  The Home Assistant physical port on opnsense, is on the I211.  Although traffic could pass from LAN -> Home Assistant through the firewall...the firewall itself could not reach the Home Assistant TCP port.

I could see the SYN from opnsense HAProxy -> Home Assistant on the server port, and the SYN,ACK reply reach the firewall, but for some reason it was being dropped.  I did not have time to look into this further, so rolled it back leaving just the WAN X710 using the Intel drivers.  This instantly resolved the HAproxy/Home Assistant issue.

July 14, 2022, 03:49:51 AM #157 Last Edit: July 14, 2022, 05:48:01 AM by Scuro
Quote from: buecker on June 20, 2022, 09:56:24 PM
My 5 port Intel I225-V box with the latest opnsense updates does not have this issue.

My 4 port I225-V b3 box is having this issue.

Details:
No spoofing, no IDS. Just IPv4, VLANS, and a weighted upload.
WAN will randomly disconnect and show the following repeated:


2022-07-13T18:03:33-07:00 Critical dhclient exiting.
2022-07-13T18:03:33-07:00 Error dhclient connection closed
2022-07-13T18:03:33-07:00 Error opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for dynamic wan(igc0)


Most times it will come back up on its own, other times it will refuse to pick up DHCP from WAN.
Turning interface on/off doesn't fix. Requires reboot or physically unplugging port.

Quote from: iMx on July 12, 2022, 08:15:55 AM
Been seeing the same on 22.1.10, somehow it seems worse since I upgraded to .10 - although I've certainly been fighting this for a while - but might be a coincidence.

....

- This morning, I have now loaded the Intel updated drivers (1.12.35)

Whilst I'm still running the updated drivers, I'm pretty sure my issue was 4 dodgy wall ports where my cable modem connects - 2 ports per 1 gang box - so when I swapped the port, even to a different box, I still saw the problem...on all 4 ports.  In the end, I ran a 15m cable direct from the opnense box to the cable modem (and got moaned at with the cable going through the house/hallway) which fixed the problem.

Long story short...I replaced the wall modules and re-terminated the cabling and I think this has resolved my problem.  Over the next week, I'll start rolling back the various changes such as the updated drivers.

I've re-enabled gateway monitoring, this was in some cases causing the problem when the port flapped - although interestingly often the flaps weren't shown in the switch logs, so whilst in duration they were short they were long/frequent enough to cause disruption when gateway monitoring was enabled.

July 16, 2022, 04:27:17 PM #159 Last Edit: July 16, 2022, 04:29:49 PM by subivoodoo
Hi all,

I did a new test with a fresh install of the current RC 22.7 (22.7.r1_8) with my Intel igc + MAC spoofing + IPS => still flapping  :(

But then I changed from DHCP to static IP => NO flapping!!!

Retested this with 22.1.10 (same Intel igc + MAC spoofing + IPS) but with static IP instead of DHCP => also no flapping! Sadly the static IP is not possible on my internet connection... just in my test VM's.

Another strange behavior:
As soon as I remove the spoofed MAC and hit "Save" (not yet "Apply changes" pressed), the flapping stops immediately.

So I think the "Intel + MAC spoof + IPS" issue must be something around DHCP within "a script"... but on Intel only??????

Regards

Since I recently experienced the same issue, I just wanted to throw in my data points.  My situation is a bit different in that I haven't really changed my config since 17.x.  However, recently I decided to turn off suricata (for no reason other than it was causing my system to run about 7C higher on average).  For two hours it worked fine.  Then the WAN link started to go up and down.  I came across this post in the past but I didn't put the two together right away.  Instead, I called my ISP and swapped out my cable modem, which solved the issue for about 18 hours, then it started to flap again.  I eventually had to enable suricata again and it has been fine ever since.  so for me, if i disable suricata the WAN starts to act up.

Current version: OPNsense 22.1.10
Interfaces: Intel pro 1000
MAC Spoofing: YES
IDS: LAN
IPS: YES


The common denominator in this thread seems to be MAC spoofing. Why is this necessary?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Some people need to use MAC spoofing because of the way their internet provider (cable), locks the IP to the devices MAC. 

Common denominator seems to be a mix of MAC Spoofing OR intel i210/i211 devices OR IPS/IDS with promiscuous mode.


As a heads up, usually most providers allow you to switch devices if you call them. You simply need to give them your MAC address and it works.
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

July 25, 2022, 09:46:05 AM #164 Last Edit: July 25, 2022, 11:31:35 AM by crissi
Installed for further testing now pfsense 2.6.0. Added spoofed MAC address to WAN Interface (Intel I211 NIC), and voila no interface flapping at all!

Driver Version is on opnsense and pfsense identical


[2.6.0-RELEASE][root@pf.home.arpa]/root: sysctl -a | grep -E 'dev.(igb|ix|em).*.iflib.driver_version:'
dev.igb.5.iflib.driver_version: 7.6.1-k
dev.igb.4.iflib.driver_version: 7.6.1-k
dev.igb.3.iflib.driver_version: 7.6.1-k
dev.igb.2.iflib.driver_version: 7.6.1-k
dev.igb.1.iflib.driver_version: 7.6.1-k
dev.igb.0.iflib.driver_version: 7.6.1-k




root@opn:~ # sysctl -a | grep -E 'dev.(igb|ix|em).*.iflib.driver_version:'
dev.igb.5.iflib.driver_version: 7.6.1-k
dev.igb.4.iflib.driver_version: 7.6.1-k
dev.igb.3.iflib.driver_version: 7.6.1-k
dev.igb.2.iflib.driver_version: 7.6.1-k
dev.igb.1.iflib.driver_version: 7.6.1-k
dev.igb.0.iflib.driver_version: 7.6.1-k
Cheers,
Crissi