igb-interface temporarily down since 23.1

Started by 8b4df00d, May 11, 2023, 08:33:51 AM

Previous topic - Next topic
May 11, 2023, 08:33:51 AM Last Edit: May 11, 2023, 11:56:00 AM by 8b4df00d
Hi,

my opnsense build uses a quad-port nic with igb-drivers for lan-communication, bundled with lacp. When i installed the card like 2 years ago i've had the problem that all ports went down after like 2 days and they stayed down till i rebooted the firewall, set the interface down and up or remove the network cables.

To fix that problem, i simply put the following tunables into my system:
dev.igb.0.eee_disabled = 1
dev.igb.1.eee_disabled = 1
dev.igb.2.eee_disabled = 1
dev.igb.3.eee_disabled = 1

After my upgrade to 23.1 these tunables are shown as unsupported and my nic started to have the problems mentioned above again. After a quick google search i found that people suggested to simply remove these and the nic should working again. But in my case the problem still exists. All ports an so my lacp is going down after like 2 days in use.

Does anyone have the same problem and maybe also a fix for this?

Thanks.


Edit #1, pciconf output

igb0@pci0:1:0:0:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x1dcf subdevice=0x0309
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 max read 512
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 80615fffff02413c
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 0018[1c0] = LTR 1
    ecap 000d[1d0] = ACS 1
igb1@pci0:1:0:1:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x1dcf subdevice=0x0309
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 max read 512
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 80615fffff02413c
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
igb2@pci0:1:0:2:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x1dcf subdevice=0x0309
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 max read 512
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 80615fffff02413c
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
igb3@pci0:1:0:3:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x1dcf subdevice=0x0309
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 max read 512
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 80615fffff02413c
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1

Hi,

my problem mentioned above is now fixed. Here's the issue:
First of all, my problems with my LACP going down after 2 days was fixed by removing the tunables dev.igb.0.eee_disabled = 1 etc.

But after removing those tunables i had weird flapping issues on devices that are routed over that dynamic trunk. Because the issues with my NIC happened after updating to 23.1, i focused on a software/hardware problem.

So i changed my NIC to an identical one, rebooted my switch, changed switch-ports... no changes, devices still flapping.

Yesterday i thought, lets give another network cable a try... boom, problem fixed  ::)


So in other words, no problem with igb-drivers on OPNsense.

As always, i want so say thank you for providing such an amazing firewall :)

Bye.

Quote from: 8b4df00d on May 11, 2023, 08:33:51 AM

dev.igb.0.eee_disabled = 1
dev.igb.1.eee_disabled = 1
dev.igb.2.eee_disabled = 1
dev.igb.3.eee_disabled = 1

After my upgrade to 23.1 these tunables are shown as unsupported

In case anyone else stumbles across this...
The driver tunable used to be (assuming igb0 interface):
dev.igb.0.eee_disabled
You now have to use:
dev.igb.0.eee_control