On 25.7.1_1 I'm having frequent (every few hours?) disconnects with my LAN-side NIC, an I226-LM. I'm using an ASRock Industrial NUC BOX 225-H (https://www.asrockind.com/en-gb/NUC%20BOX-225H). I can't find anything in the logs to indicate why it is disconnecting, only the physical removal/insertion messages. Performing the re-insertion restores network connectivity. The NIC is attached to a 2.5G unmanaged switch; I have tried different CAT-5E (1m) cables and ports on the switch.
The WAN interface, an I226-V, has not had any failures/disconnections.
Any thoughts at what logs to examine?
Check the sysctl "hw.igc.eee_setting"? (Should be disabled, which is the default.)
Does the device it's plugged into have any useful diagnostics? (Hopefully not an unmanaged switch...)
(Aside: TAA compliant? Interesting. I wonder what the rule is for a device like that.)
hw.igc.eee_setting is 1.
Yes, it is an unmanaged switch.
I've disabled Intel Virtualization/VT-d (I'm sure that changes nothing) in the BIOS; I also disabled "Intel PAT"
[1] as part of the AMT configuration, the remaining AMT configuration had been previously disabled. So far >24 hours so good.
This "fixed" (honestly, I don't know what fixed it) another issue I had. I use Kea DHCP and it had stopped sending OFFER packets, or at least OFFER packets had stopped leaving the interface. I was able to re-enable Kea after disabling the PAT option in the BIOS.
I had found another post that clued me into Intel AMT causing issues with OFFER packets. I figured there might be a connection with the OFFER packets ceasing to leave the interface and my issue. I know not all traffic stopped as an active VPN connection on another client continued to work while any traffic on that client which was split-tunneled ceased to work.
As an aside, when I disabled the serial port in the BIOS, FreeBSD would no longer boot -- I would get an atrtc0: Warning: Couldn't map I/O [2] and the system would halt. Not a big deal to leave it enabled, I was simply disabling everything I could :-)
[1] https://www.intel.com/content/www/us/en/developer/topic-technology/platform-analysis-technology/overview.html
[2] https://lists.freebsd.org/archives/freebsd-amd64/2021-October/000036.html
The last thing I want to do is disable WiFi since it is also not needed. I should probably just pop the card out.
I was wrong, this is not solved, there is something in 25.7 that continues to cause the disconnect behavior; re-insertion of the network cable restores network connectivity.
There was another posting about Problems with the newest Proxmox Kernel. Try downgrading it.
Found this thread over on the FreeBSD bugtracker - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279245. I opened a ticket with ASRock to see if they can provide the PCIe ASPM boolean option in the BIOS. It could have been coincidence that I didn't encounter any drops with the previous opnsense release. Hard to tell, there's no event in any of the logs to indicate anything happened.
I may also try to just virtualize opnsense on that box; I moved back to my N100 which is running proxmox with I226-V controllers which has been running without issue for some time.
ASRock built me a custom BIOS that disables ASPM for both NICs, but this particular issue only impacted the i226-LM which was verified by the first custom BIOS only disabling ASPM on the i226-V NIC with a follow up BIOS disabling it on both. The NIC has been stable for a full week vs hours/days.
Can the tunable "hw.pci.enable_aspm" be used to do this in the abscence of an oem bios update?
D
I think so (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279245#c7), at least you can give it a try.