Losing WAN connection periodically

Started by jstarta, August 21, 2025, 09:38:14 PM

Previous topic - Next topic
Hey all, I'm trying to figure out what's going wrong. Nearly everyday i'm losing connection on my WAN interface, and I can't find anything in the logs (though i'm not really sure which logs I should be looking at). I'm running Opnsense on bare-metal on an MSI Cubi NUC 1M (Intel Core 5, 16GB RAM, 500GB SSD). It has 2x Intel I226-V, I have the WAN interface set to auto-negotiate the speed.

When it loses connection I either need to reboot, or just go into the interface settings and click save which seems to be enough to get it reconnected. As for Opnsense, I keep the version up to date (25.7.2), I'm running IDS/IPS just on my LAN interface with the Hyperscan Pattern Matcher. Crowdsec, and Wireguard are also running. I've disabled all Hardware settings in the interface settings (CRC,TSO, LRO, VLAN Filtering).

What logs should I be looking at to help me figure out what the issue is?

Any help would be appreciated

Just to add to this - when it loses WAN it's reporting as 100% Packet Loss. This is the hourly from the last couple months


iso_timelossdelaystddev
2025-06-17T14:00:00+10:0092.3300979220.000524916076890.00018458892351
2025-06-23T05:00:00+10:0010000
2025-06-29T06:00:00+10:0010000
2025-06-29T07:00:00+10:0010000
2025-06-29T08:00:00+10:0010000
2025-06-29T09:00:00+10:0010000
2025-06-29T10:00:00+10:0010000
2025-06-29T11:00:00+10:0010000
2025-06-29T12:00:00+10:0010000
2025-06-29T13:00:00+10:0010000
2025-06-29T14:00:00+10:0010000
2025-06-29T15:00:00+10:0010000
2025-06-29T16:00:00+10:0067.5381490560.0371665753590.13227880102
2025-07-07T08:00:00+10:0010000
2025-07-07T09:00:00+10:0010000
2025-07-27T01:00:00+10:0010000
2025-07-27T02:00:00+10:0010000
2025-07-27T03:00:00+10:0010000
2025-07-29T12:00:00+10:0010000
2025-07-29T13:00:00+10:0010000
2025-07-29T14:00:00+10:0075.0366298330.00152482407530.00042127605431
2025-08-14T01:00:00+10:0010000
2025-08-14T02:00:00+10:0010000
2025-08-14T03:00:00+10:0010000
2025-08-14T04:00:00+10:0010000
2025-08-14T05:00:00+10:0010000
2025-08-19T03:00:00+10:0010000
2025-08-21T01:00:00+10:0071.8974300380.0025388758040.0013372099454
2025-08-21T02:00:00+10:0010000
2025-08-21T03:00:00+10:0010000

Quote from: jstarta on August 21, 2025, 09:38:14 PM[...]
What logs should I be looking at to help me figure out what the issue is? [...]

I'd look at ARP. One of the logs (General, I believe) may log ARP changes, but that's usually only when ARP moves between bridge member interfaces. You'll probably have to look when you lose connectivity. It could also be the (apparent) i226 ASPM issue.

The most important bit in this mystery is the type of your WAN connection.

Quote from: pfry on August 23, 2025, 01:26:19 AM
Quote from: jstarta on August 21, 2025, 09:38:14 PM[...]
What logs should I be looking at to help me figure out what the issue is? [...]

I'd look at ARP. One of the logs (General, I believe) may log ARP changes, but that's usually only when ARP moves between bridge member interfaces. You'll probably have to look when you lose connectivity. It could also be the (apparent) i226 ASPM issue.

I only see two entries for the WAN interface - i'll take a look at my bios for the ASPM settings (Thanks for the hint).

Quote from: Jyling on August 23, 2025, 04:33:48 AMThe most important bit in this mystery is the type of your WAN connection.

It's set as a IPv4 DHCP connection, though I guess technically it's static IPv4 because my ISP gives me a static ip

August 23, 2025, 05:32:29 AM #5 Last Edit: August 23, 2025, 05:59:16 AM by jstarta
Just a quick add - I checked the BIOS for any ASPM stuff but couldn't see anything. I did see an ErP Ready setting which i've just disabled now (Seemed to have something to do with limiting power).

Added the tunable "hw.pci.enable_aspm" and set it to 0. I'll give it a reboot at some point and then see how it all goes. This BIOS is definitely lacking a lot of advanced features :(

Quote from: jstarta on August 23, 2025, 05:18:46 AMIt's set as a IPv4 DHCP connection, though I guess technically it's static IPv4 because my ISP gives me a static ip

Cable, Ethernet or fiberoptics?

Curious if you ever resolved the issue as I have the same MSI and issue. I tested Ipfire for a week and never had an issue so I know the hardware is solid. If you did resolve the issue could you please share the fix. Thank you.

Quote from: Jyling on August 23, 2025, 05:42:39 PM
Quote from: jstarta on August 23, 2025, 05:18:46 AMIt's set as a IPv4 DHCP connection, though I guess technically it's static IPv4 because my ISP gives me a static ip

Cable, Ethernet or fiberoptics?

Ethernet. Setting that tunable didn't seem to fix things unfortunately.

I had a look at the pciconf for bother interfaces, and it looks like the tunable didn't take effect 'hw.pci.enable_aspm=0', because it states ASPM is still enabled in the output:


root@OPNsense:~ # pciconf -lbcevV igc1
igc1@pci0:89:0:0:       class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x1462 subdevice=0xb0b1
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I226-V'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0x6a300000, size 1048576, enabled
    bar   [1c] = type Memory, range 32, base 0x6a400000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 5.0(5.0) ASPM L1(L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 d843aeffffbc6cac
    ecap 0018[1c0] = LTR 1
    ecap 001f[1f0] = Precision Time Measurement 1
    ecap 001e[1e0] = L1 PM Substates 1
root@OPNsense:~ # pciconf -lbcevV igc0
igc0@pci0:88:0:0:       class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x1462 subdevice=0xb0b1
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I226-V'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0x6a600000, size 1048576, enabled
    bar   [1c] = type Memory, range 32, base 0x6a700000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 5.0(5.0) ASPM L1(L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 d843aeffffbc6cab
    ecap 0018[1c0] = LTR 1
    ecap 001f[1f0] = Precision Time Measurement 1
    ecap 001e[1e0] = L1 PM Substates 1

August 27, 2025, 05:54:33 AM #10 Last Edit: August 27, 2025, 06:06:04 AM by BrandyWine
Hmmm, well, my i226v N150 has the aspm disabled on igc, but I don't see where the setting that disables it, seems like my settings are set to "1".
Are you running powerd?

sysctl -a |grep hw.pci.enable
hw.pci.enable_pcie_ei: 0
hw.pci.enable_pcie_hp: 1
hw.pci.enable_mps_tune: 1
hw.pci.enable_aspm: 1
hw.pci.enable_ari: 1
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1
hw.pci.enable_io_modes: 1

pciconf -lbcevV igc1
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 5.0(5.0) ASPM disabled(L1)


You have WAN dhcp? What does the lease time look like?
in "/var/db/dhclient.leases.igcX" , X being your WAN iface number

option dhcp-lease-time

@jstarta: Please show the output of "sysctl hw.pci" - I do not believe that the ASPM setting was applied correctly.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote from: BrandyWine on August 27, 2025, 05:54:33 AMHmmm, well, my i226v N150 has the aspm disabled on igc, but I don't see where the setting that disables it, seems like my settings are set to "1".
Are you running powerd?

sysctl -a |grep hw.pci.enable
hw.pci.enable_pcie_ei: 0
hw.pci.enable_pcie_hp: 1
hw.pci.enable_mps_tune: 1
hw.pci.enable_aspm: 1
hw.pci.enable_ari: 1
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1
hw.pci.enable_io_modes: 1

pciconf -lbcevV igc1
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                max read 512
                link x1(x1) speed 5.0(5.0) ASPM disabled(L1)


You have WAN dhcp? What does the lease time look like?
in "/var/db/dhclient.leases.igcX" , X being your WAN iface number

option dhcp-lease-time

Not sure if this is normal, but there are a lot of leases:

root@OPNsense:~ # cat /var/db/dhclient.leases.igc1
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  renew 3 2025/8/27 07:24:18;
  rebind 3 2025/8/27 07:35:33;
  expire 3 2025/8/27 07:39:18;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  option dhcp-renewal-time 900;
  option dhcp-rebinding-time 1575;
  renew 3 2025/8/27 07:32:05;
  rebind 3 2025/8/27 07:43:20;
  expire 3 2025/8/27 07:47:05;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  renew 3 2025/8/27 07:47:05;
  rebind 3 2025/8/27 07:58:20;
  expire 3 2025/8/27 08:02:05;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  option dhcp-renewal-time 900;
  option dhcp-rebinding-time 1575;
  renew 3 2025/8/27 08:02:05;
  rebind 3 2025/8/27 08:13:20;
  expire 3 2025/8/27 08:17:05;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  renew 3 2025/8/27 08:17:05;
  rebind 3 2025/8/27 08:28:20;
  expire 3 2025/8/27 08:32:05;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  option dhcp-renewal-time 900;
  option dhcp-rebinding-time 1575;
  renew 3 2025/8/27 08:32:05;
  rebind 3 2025/8/27 08:43:20;
  expire 3 2025/8/27 08:47:05;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  option dhcp-renewal-time 900;
  option dhcp-rebinding-time 1575;
  renew 3 2025/8/27 08:47:06;
  rebind 3 2025/8/27 08:58:21;
  expire 3 2025/8/27 09:02:06;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  renew 3 2025/8/27 09:02:06;
  rebind 3 2025/8/27 09:13:21;
  expire 3 2025/8/27 09:17:06;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  option dhcp-renewal-time 900;
  option dhcp-rebinding-time 1575;
  renew 3 2025/8/27 09:17:06;
  rebind 3 2025/8/27 09:28:21;
  expire 3 2025/8/27 09:32:06;
}
lease {
  interface "igc1";
  fixed-address AAA.BBB.CC1.132;
  option subnet-mask 255.255.252.0;
  option routers AAA.BBB.CC0.1;
  option domain-name-servers XXX.YYY.ZZZ.142,XXX.YYY.ZZZ.242;
  option host-name "opnsense";
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option dhcp-server-identifier AAA.BBB.CC0.1;
  renew 3 2025/8/27 09:32:06;
  rebind 3 2025/8/27 09:43:21;
  expire 3 2025/8/27 09:47:06;
}

Quote from: meyergru on August 27, 2025, 09:55:23 AM@jstarta: Please show the output of "sysctl hw.pci" - I do not believe that the ASPM setting was applied correctly.

root@OPNsense:~ # sysctl hw.pci
hw.pci.mcfg: 1
hw.pci.host_mem_start: 2147483648
hw.pci.default_vgapci_unit: 0
hw.pci.enable_pcie_ei: 0
hw.pci.pcie_hp_detach_timeout: 5000
hw.pci.enable_pcie_hp: 1
hw.pci.clear_pcib: 0
hw.pci.iov_max_config: 1048576
hw.pci.intx_reroute: 1
hw.pci.enable_mps_tune: 1
hw.pci.clear_aer_on_attach: 0
hw.pci.enable_aspm: 0
hw.pci.enable_ari: 1
hw.pci.clear_buses: 0
hw.pci.clear_bars: 0
hw.pci.usb_early_takeover: 1
hw.pci.honor_msi_blacklist: 1
hw.pci.msix_rewrite_table: 0
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1
hw.pci.do_power_suspend: 0
hw.pci.do_power_resume: 1
hw.pci.do_power_nodriver: 0
hw.pci.realloc_bars: 1
hw.pci.enable_io_modes: 1
root@OPNsense:~ # pciconf -lbcevV igc1
igc1@pci0:89:0:0:       class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x1462 subdevice=0xb0b1
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I226-V'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0x6a300000, size 1048576, enabled
    bar   [1c] = type Memory, range 32, base 0x6a400000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 5.0(5.0) ASPM L1(L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 d843aeffffbc6cac
    ecap 0018[1c0] = LTR 1
    ecap 001f[1f0] = Precision Time Measurement 1
    ecap 001e[1e0] = L1 PM Substates 1


August 27, 2025, 12:09:54 PM #13 Last Edit: August 27, 2025, 12:12:52 PM by meyergru
That is really strange. The sysctl seems to be active, yet your ASPM is enabled? Never saw that. Mine is disabled, but I can disable it in the BIOS, too.

Maybe you could ask MSI for a BIOS where you can disable that. Also, if it is a standard BIOS, there are tools out there with which you can modify your BIOS to show more settings. Of course, you need the BIOS image first and some companies do not even offer any.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

How do I confirm that the igc driver is loaded correctly? I think I read somewhere there should be a kernel module present and loaded. I can't find it now though. I'd have thought that because it's identified the device that the driver wouldn't be the issue.

I'm new to BSD so I don't know how to really troubleshoot this stuff unfortunately.