WAN interface flapping with 22.1.2

Started by foxmanb, March 03, 2022, 01:45:18 PM

Previous topic - Next topic
A quick google search turns up a past issue between the intel igb driver/freebsd and people using coreboot bios... I do use coreboot, anyone else?  We need to find some sort of commonality for @franco and team to replicate. Nothing has changed in my setups with the exception of upgrading from 21 to 22.

My setup (multiple locations that have this problem )
- Dual WAN (Comcast/Xfinity [Primary] + Satellite [starlink or hughesnet] )  Some are static IP others are DHCP
- WAN setup for failover
- Surricata on the Lan
- Intel igb210 & igb211
- No MAC spoofing
- Multiple vLans on LAN
- Multiple OpenVPN site to site tunnels
- Wiregaurd (go) on primary WAN
- Protectli FW4 + FW6 boxes running Coreboot Bios
- Gateway monitoring ON
- MDNS Repeater on LAN/vLAN
- Wake on LAN on LAN
- No IPv6
- PowerD enabled w/ hiadaptive
- Monit Gateway alerting enabled

Yes it is an assumption that it is Intel NIC only... but I can see here issue reports for Intel only (drivers em, igb, igc, ix and ixl) together with MAC spoofing and there is an earlier post here fixing this issue with updated Intel em drivers:

Quote from: kropotkin on March 21, 2022, 04:33:50 PM
I had the WAN flapping issue with mac spoofing and have resolved by installing the updated FreeBSD 13 em driver, though I don't know if you have intel nics as well?
To generate the driver file I spun up a FreeBSD vm then pkg search intel-em-mod and install. Copied the if_em_updated.ko driver to /boot/modules/ as per Franco's reply in this post https://forum.opnsense.org/index.php?topic=20905.0.
I also disabled suricata on the wan interface and turned off flow control on all NICs.
Now running on a non flapping OPNsense 22.1.3 with WAN DHCP and mac spoofing.
@Franco - can this driver be added to OPNsense as it seems to resolve a number of stability issues.

I see 2 main issue (at the moment with intel drivers only since 22.1.2):
- VLAN's + MAC spoofing
- IDP with IPS mode + MAC spoofing

By the way, here easy reproducable (https://forum.opnsense.org/index.php?topic=26672.0):
- new clean OPNsense 22.1.2 or higher (on Intel NIC only?)
- enter some spoofed MAC
- enable IDP with IPS and wait 2 minutes until Suricata is fully loaded
=> link is flapping with DOWN/UP messages on monitor console
- delete spoofed MAC from this interface and hit "save"
- DOWN/UP monitor console messages immediately disappears ("Apply" not yet pressed!)

@subivoodoo Except I am having this issue and I don't use any MAC Spoofing.

@tracerrx different or additional issue than all the others here?

April 29, 2022, 05:52:31 PM #64 Last Edit: April 29, 2022, 06:17:03 PM by Grossartig
It was pointed out to me that my issue (https://forum.opnsense.org/index.php?topic=28158.0) which I had reported separately a few hours ago may be the same as the one discussed in this thread.

To which I want to add that my box is not using coreboot but AMI, and it's using Realtek Ethernet controllers. Also, disabling IPS seems to allow my box to obtain a WAN IP again (but unsure for how long -- currently testing here). Also, no MAC spoofing, suricata was configured on WAN (not LAN), no VLANs, no IPv6. Also, only a single LAN, no multi WAN here.

More system details here, for completeness (network controller details at bottom):


# pciconf -lv
hostb0@pci0:0:0:0: class=0x060000 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5af0 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
vgapci0@pci0:0:2:0: class=0x030000 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5a85 subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'HD Graphics 500'
    class      = display
    subclass   = VGA
hdac0@pci0:0:14:0: class=0x040300 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5a98 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series Audio Cluster'
    class      = multimedia
    subclass   = HDA
none0@pci0:0:15:0: class=0x078000 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5a9a subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series Trusted Execution Engine'
    class      = simple comms
ahci0@pci0:0:18:0: class=0x010601 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5ae3 subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series SATA AHCI Controller'
    class      = mass storage
    subclass   = SATA
pcib1@pci0:0:19:0: class=0x060400 rev=0xfb hdr=0x01 vendor=0x8086 device=0x5ad8 subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A'
    class      = bridge
    subclass   = PCI-PCI
pcib2@pci0:0:19:2: class=0x060400 rev=0xfb hdr=0x01 vendor=0x8086 device=0x5ada subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:19:3: class=0x060400 rev=0xfb hdr=0x01 vendor=0x8086 device=0x5adb subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:0:20:0: class=0x060400 rev=0xfb hdr=0x01 vendor=0x8086 device=0x5ad7 subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B'
    class      = bridge
    subclass   = PCI-PCI
xhci0@pci0:0:21:0: class=0x0c0330 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5aa8 subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series USB xHCI'
    class      = serial bus
    subclass   = USB
sdhci_pci0@pci0:0:28:0: class=0x080501 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5acc subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series eMMC Controller'
    class      = base peripheral
    subclass   = SD host controller
sdhci_pci1@pci0:0:30:0: class=0x080501 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5ad0 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series SDIO Controller'
    class      = base peripheral
    subclass   = SD host controller
isab0@pci0:0:31:0: class=0x060100 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5ae8 subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series Low Pin Count Interface'
    class      = bridge
    subclass   = PCI-ISA
none1@pci0:0:31:1: class=0x0c0500 rev=0x0b hdr=0x00 vendor=0x8086 device=0x5ad4 subvendor=0x19da subdevice=0xb325
    vendor     = 'Intel Corporation'
    device     = 'Celeron N3350/Pentium N4200/Atom E3900 Series SMBus Controller'
    class      = serial bus
    subclass   = SMBus
iwm0@pci0:1:0:0: class=0x028000 rev=0x81 hdr=0x00 vendor=0x8086 device=0x3165 subvendor=0x8086 subdevice=0x4010
    vendor     = 'Intel Corporation'
    device     = 'Wireless 3165'
    class      = network
re0@pci0:2:0:0: class=0x020000 rev=0x0c hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x10ec subdevice=0x0123
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
re1@pci0:4:0:0: class=0x020000 rev=0x0c hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x10ec subdevice=0x0123
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

@subivoodoo Sam issues as the others here... dpinger will state that the gateway is down, remove it from the fail-over group from 10  - 120 seconds, then put it back.  Happens repeatedly until you go in and change the gateway monitor IP, then it stabilizes for 3-4 days.  Rinse repeat.  However when dpinger says gateway is down, it is in fact not down.  I have increased all my thresholds but problem still occurs.

Just to keep this issue under attention a small update from my side. Checkup on losing WAN connection for the last couple of days:
2022-04-29T02:35:36 Notice configd.py [9a0eae11-df9a-417b-b714-eb723d44fd0a] Linkup stopping igb0
2022-04-27T09:05:55 Notice configd.py [64f504d3-3e5d-4b62-9976-f2316296d9d9] Linkup stopping igb0
2022-04-25T01:19:10 Notice configd.py [e164f86e-bc77-426c-886c-d01f40b3da50] Linkup stopping igb0

So still every couple of days, for no reason, the WAN connection is lost. MAC spoofing is off and no IPS. Current OPSense version: 22.1.6 (amd64) on Protectli FW4B with Intel NICs.

Note that configd action for "Linkup stopping igb0" is called by devd in the operating system, likely due to reacting to either a hardware or software event detaching the interface. We can't do much about hardware flapping, and for software flapping there is only netmap responsible either by IPS mode intrusion detection or zenamor. The software flap may also be introduced by intrusion detection rules update. ;)


Cheers,
Franco

@franco It seems to be related to dpinger. 

All my links have a primary low latency connection, and a secondary satellite connection that's higher latency (starlink 40-100ms or hughesnet 700-1500ms).  Even though I have accounted for the higher latency in the gateway monitor settings for each satellite connection, the high latency on the secondary link seems to be making dpinger miss responses on the low latency primary link.  This did NOT occur in the 21.x series, and was introduced in 22.x (FreeBSD 13).  Disabling gateway monitoring on the secondary connections seems to resolve the flapping of the primary link.

FWIW, these are all Protectli FW4 and FW6, no mac spoofing, Suricata on LAN, Dual WAN

But dpinger won't do a linkdown/up as far as I know.

Latency is tricky and needs to be accounted for in advanced monitoring settings per gateway. Gateway monitoring can provide false-positives indeed.


Cheers,
Franco

@franco I understand the latency and the settings below have always worked in the past for hughesnet/viasat connection monitoring.  But previous to 22.x the high latency on wan2 never caused wan1 to be removed from the wan group. 

Latency thresholds: 1000 - 2500
Packet Loss thresholds: 30 - 60
Probe Interval: 15
Alert Interval: 60
Time Period: 60
Loss Interval: 4

For me anyway no IPS or Zenarmor.

If the OS is the problem, it is probably a driver issue which was introduced in OPNSense 22 with the introduction of FreeBSD 13. In 21 series I also had no problem. This driver issue has been mentioned before.
Any news on a new release with updated drivers?

@edwin70 I believe the updated drivers are only for the intel em not intel igb

@tracerrx Thanks for the info. I did not know that. In that case I wait for other solutions to come. Hopefully soon, as I'm back on the 21.x version. :(

@edwin70 Don't tell my wife but I was wrong.. Just looked it up.. em drivers do support i211 and i210 in freebsd 13...



The em(4) driver supports Gigabit Ethernet adapters based on the Intel 82540, 82541ER, 82541PI, 82542, 82543, 82544, 82545, 82546, 82546EB, 82546GB, 82547, 82571, 82572, 82573, 82574, 82575, 82576, and 82580 controller chips:

    Intel Gigabit ET Dual Port Server Adapter (82576)

    Intel Gigabit VT Quad Port Server Adapter (82575)

    Intel Single, Dual and Quad Gigabit Ethernet Controller (82580)

    Intel i210 and i211 Gigabit Ethernet Controller

    Intel i350 and i354 Gigabit Ethernet Controller

    Intel PRO/1000 CT Network Connection (82547)

    Intel PRO/1000 F Server Adapter (82543)

    Intel PRO/1000 Gigabit Server Adapter (82542)

    Intel PRO/1000 GT Desktop Adapter (82541PI)

    Intel PRO/1000 MF Dual Port Server Adapter (82546)

    Intel PRO/1000 MF Server Adapter (82545)

    Intel PRO/1000 MF Server Adapter (LX) (82545)

    Intel PRO/1000 MT Desktop Adapter (82540)

    Intel PRO/1000 MT Desktop Adapter (82541)

    Intel PRO/1000 MT Dual Port Server Adapter (82546)

    Intel PRO/1000 MT Quad Port Server Adapter (82546EB)

    Intel PRO/1000 MT Server Adapter (82545)

    Intel PRO/1000 PF Dual Port Server Adapter (82571)

    Intel PRO/1000 PF Quad Port Server Adapter (82571)

    Intel PRO/1000 PF Server Adapter (82572)

    Intel PRO/1000 PT Desktop Adapter (82572)

    Intel PRO/1000 PT Dual Port Server Adapter (82571)

    Intel PRO/1000 PT Quad Port Server Adapter (82571)

    Intel PRO/1000 PT Server Adapter (82572)

    Intel PRO/1000 T Desktop Adapter (82544)

    Intel PRO/1000 T Server Adapter (82543)

    Intel PRO/1000 XF Server Adapter (82544)

    Intel PRO/1000 XT Server Adapter (82544)