Good morning. I upgraded to 25.1.2 from 24.7 a couple weeks ago and did not notice any problems right away, but ~4 days after upgrading my firewall became unresponsive and was intermittently routing traffic. Investigating shows kernel{if_io_tqg_2} seemingly hung as it uses 100% of a core causing the load on the box to gradually increase until services stop responding.
198 threads: 7 running, 177 sleeping, 14 waiting
CPU 0: 1.2% user, 0.0% nice, 1.9% system, 0.0% interrupt, 96.9% idle
CPU 1: 0.4% user, 0.0% nice, 0.8% system, 0.0% interrupt, 98.8% idle
CPU 2: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
CPU 3: 0.4% user, 0.0% nice, 1.5% system, 0.0% interrupt, 98.1% idle
Mem: 130M Active, 1069M Inact, 1172M Wired, 343M Buf, 1607M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -60 - 0B 704K CPU2 2 5:28 99.97% kernel{if_io_tqg_2}
2 root -60 - 0B 64K WAIT 1 1:26 1.03% clock{clock (0)}
36345 unbound 20 0 278M 217M kqread 0 0:21 0.68% unbound{unbound}
0 root -60 - 0B 704K - 1 0:24 0.24% kernel{if_io_tqg_1}
I have not found a way to recover from this other than rebooting the firewall. After rebooting the firewall it became unresponsive again within 2 hours. I rebooted the firewall again and it was fine for another ~3 days and then the same problem occurred. I upgraded to 25.1.3 as soon as it came out with hopes it would resolve my problem but it did not.
I've Googled around and not found a definitive answer but did find this post https://www.reddit.com/r/PFSENSE/comments/1ags2z6/pfsense_locks_after_a_few_days_routes_traffic_but/ which is very similar but obviously pfsense and different software versions with no clear solution other than 'patched'.
I did not see this in 24.7. Does anyone have some ideas on what I could look at next to help diagnose and resolve this? Any help is greatly appreciated.
Which NIC hardware?
Quote from: meyergru on March 18, 2025, 12:01:19 PMWhich NIC hardware?
Thanks for the reply! This firewall has the Intel(R) I210 NICs.
# sysctl -a | grep -E 'dev.(igb|ix|em).*.%desc:'
dev.igb.3.%desc: Intel(R) I210 Flashless (Copper)
dev.igb.2.%desc: Intel(R) I210 Flashless (Copper)
dev.igb.1.%desc: Intel(R) I210 Flashless (Copper)
dev.igb.0.%desc: Intel(R) I210 Flashless (Copper)
I had hangs like that because my hardware could not handle ASPM correctly. After disabling that in the BIOS, the problem went away.
Quote from: meyergru on March 18, 2025, 12:27:23 PMI had hangs like that because my hardware could not handle ASPM correctly. After disabling that in the BIOS, the problem went away.
This system is running coreboot for the BIOS, I am not sure how to disable ASPM via coreboot currently so I think I disabled ASPM via the tunables section of OPNsense
System -> Settings -> Tunables
Tunable: hw.pci.enable_aspm
Value: 0
Hit apply and then rebooted the firewall. Currently trying to verify if ASPM is disabled or not.
Disabling ASPM via the Tunables did not seem to disable ASPM on the intefaces
igb0@pci0:1:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x157b subvendor=0x8086 subdevice=0x0000
vendor = 'Intel Corporation'
device = 'I210 Gigabit Network Connection'
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0x91000000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0x1000, size 32, enabled
bar [1c] = type Memory, range 32, base 0x91020000, size 16384, enabled
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR RO NS
max read 512
link x1(x1) speed 2.5(2.5) ASPM L1(L0s/L1)
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
ecap 0003[140] = Serial 1 00e067ffff22f83c
ecap 0017[1a0] = TPH Requester 1
and the system became hung again because of that thread using 100% system CPU, so I have gone and reinstalled 24.7 and restored from backup. If anyone knows how to disable ASPM via coreboot or another way in FreeBSD I would love to try and see if that resolved my problems so I can upgrade to 25.
Reporting-Settings
RRD Running ? Try turning it off. Also reset RRD and Netflow data.
For me, the device looks like:
igc0@pci0:1:0:0: class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000
vendor = 'Intel Corporation'
device = 'Ethernet Controller I226-V'
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0x80a00000, size 1048576, enabled
bar [1c] = type Memory, range 32, base 0x80b00000, size 16384, enabled
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
max read 512
link x1(x1) speed 5.0(5.0) ASPM disabled(L1)
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
ecap 0003[140] = Serial 1 60beb4ffff16a800
ecap 0018[1c0] = LTR 1
ecap 001f[1f0] = Precision Time Measurement 1
ecap 001e[1e0] = L1 PM Substates 1
IDK how to force ASPM off, though. Did you also try dev.igb.X.eee_disabled=1 (https://forum.opnsense.org/index.php?msg=23591)?
Thanks for the suggestion, I added that and rebooted, still the same results
# sysctl -a | grep igb.2 | grep eee
dev.igb.2.eee_control: 1
# pciconf -lbcevV igb2@pci0:3:0:0
igb2@pci0:3:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x157b subvendor=0x8086 subdevice=0x0000
vendor = 'Intel Corporation'
device = 'I210 Gigabit Network Connection'
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0x91200000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0x3000, size 32, enabled
bar [1c] = type Memory, range 32, base 0x91220000, size 16384, enabled
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR RO NS
max read 512
link x1(x1) speed 2.5(2.5) ASPM L1(L0s/L1)
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
ecap 0003[140] = Serial 1 00e067ffff22f83e
ecap 0017[1a0] = TPH Requester 1