Hi everyone,
first time user here so i'll try to explain the problem as best as i can.
My setup goes as follows: dsl line>dlink dva-5592 pure bridge>PPPoE wan MS-01 opnsense>5port netgear dumb switch>devices
At random the connection to the internet from lan devices drops randomly, i also cannot access opnsense from either the webgui or ssh.
The problem is seemingly(or temporarily at least) fixed by unplugging and replugging the LAN cable from the switch to the opnsense machine.
Please let me know if you need any more info
If by MS-01 you mean a Minisforum MS-01, I have exactly the same problem, but actually on the WAN side which is connected from the I226-V to an ONT that is capable of 2.5 Gbps. The connection goes down around once per day and can only be reanimated by a port reset or replugging the cable.
This seems to be a problem with I226 more than with I226 chips and happens more often with 2.5 Gbps than with 1 Gbps. See also my comment and Jim's answer here: https://www.youtube.com/watch?v=_wgX1sDab-M , it seems that he has no problems, but he uses OpnSense under Proxmox.
There are reports of this all over the internet, see this (https://forums.evga.com/PSA-Intel-I226V-25GbE-on-Raptor-Lake-Motherboards-Has-a-Connection-Drop-Issue-No-Fix-m3595279.aspx). However, I get the impression that for Windows and Linux, it has been (at least partly) fixed in the driver, potentially by doing a soft reset when the error is detected. I suspect that the FreeBSD drivers do not have the same fixes yet.
Do you use the I226-V or the I226-LM (the one which also supports AMT)?
Thanks meyergru for the response.
Indeed I mean the Minisforum ms-01. I am using it with a 1gpbs connection so I suppose the problem is not exclusive to 2.5gb.
I am not sure which one is being used between the V and LM, how would I check that?
My possible solutions at the moment would be to either use sfp to rj45 modules with the x710(if it even supports 1gbps connection) or to install a working gigabit pcie nic(which one do you suggest?), is it correct?
Thanks
You can check with "pciconf -lv", which gives you this:
igc0@pci0:87:0:0: class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000
vendor = 'Intel Corporation'
device = 'Ethernet Controller I226-V'
class = network
subclass = ethernet
igc1@pci0:88:0:0: class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125b subvendor=0x8086 subdevice=0x0000
vendor = 'Intel Corporation'
device = 'Ethernet Controller I226-LM'
class = network
subclass = ethernet
As you can see, igc0 is the I226-V, igc1 is the I226-LM. I only tested igc0. Besides, the ports are strangely numbered, I think ixl0 is on the left but igc0 is on the right or vice versa...
I advise against using SFP+ sticks for Ethernet, they get awfully hot. You could use a NIC in the PCIE slot, do not use a Realtek 8125, but an Intel I225V-based adapter (as I said, they are less problematic than their successors).
I currently use only ixl0 with a DAC connection to my switch, it has some VLANs anyway and I have my ONT attached to a port on my switch (which is capable of 2.5 Gbps as well).
I am using the I226-V for the LAN and the I226-LM for the PPPoE WAN. I noticed another behaviour of the modem, it shuts down the dsl connection(bridge too as a result), I wonder if its because it loses connection to the LM.
Even if you switch both NICs, I bet you would have the LAN connection loss, because as I said, IDK about the I226-LM, but the I226-V definitely has these problems. And now we can tell that it is not limited to 2.5 Gbps connections.
Since Jim form Jim's Garage says he is fine, it seems that those problems may have been fixed in the Linux drivers. Interestingly enough, there are no Intel OEM drivers for FreeBSD for either I225 or I226 in their current Intel Network Driver package 29.1.
If you do not need 2,5 Gbps, you could also choose a 2-port Intel I210 adapter. For 2.5 Gbps, there is also the Aquantia and Marvell types, but IDK if they are supported under FreeBSD.
Quote from: meyergru on May 20, 2024, 02:36:48 PM
Since Jim form Jim's Garage says he is fine, it seems that those problems may have been fixed in the Linux drivers. Interestingly enough, there are no Intel OEM drivers for FreeBSD for either I225 or I226 in their current Intel Network Driver package 29.1.
Seems I was right (take it with a grain of salt as I did no deep dive):
https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/igc/igc_main.c , starting at line 3150:
if (test_bit(IGC_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
struct igc_hw *hw = &adapter->hw;
/* Detect a transmit hang in hardware, this serializes the
* check with the clearing of time_stamp and movement of i
*/
clear_bit(IGC_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags);
if (tx_buffer->next_to_watch &&
time_after(jiffies, tx_buffer->time_stamp +
(adapter->tx_timeout_factor * HZ)) &&
!(rd32(IGC_STATUS) & IGC_STATUS_TXOFF) &&
(rd32(IGC_TDH(tx_ring->reg_idx)) != readl(tx_ring->tail)) &&
!tx_ring->oper_gate_closed) {
/* detected Tx unit hang */
netdev_err(tx_ring->netdev,
"Detected Tx Unit Hang\n"
" Tx Queue <%d>\n"
" TDH <%x>\n"
" TDT <%x>\n"
" next_to_use <%x>\n"
" next_to_clean <%x>\n"
"buffer_info[next_to_clean]\n"
" time_stamp <%lx>\n"
" next_to_watch <%p>\n"
" jiffies <%lx>\n"
" desc.status <%x>\n",
tx_ring->queue_index,
rd32(IGC_TDH(tx_ring->reg_idx)),
readl(tx_ring->tail),
tx_ring->next_to_use,
tx_ring->next_to_clean,
tx_buffer->time_stamp,
tx_buffer->next_to_watch,
jiffies,
tx_buffer->next_to_watch->wb.status);
netif_stop_subqueue(tx_ring->netdev,
tx_ring->queue_index);
/* we are about to reset, no point in enabling stuff */
return true;
}
}
This section detects a TX queue hang after a timeout and then resets the adapter. I found nothing to this extent in the FreeBSD igc driver. Also, there is nothing comparable to this part from the Linux igc driver:
/**
* igc_tx_timeout - Respond to a Tx Hang
* @netdev: network interface device structure
* @txqueue: queue number that timed out
**/
static void igc_tx_timeout(struct net_device *netdev,
unsigned int __always_unused txqueue)
{
struct igc_adapter *adapter = netdev_priv(netdev);
struct igc_hw *hw = &adapter->hw;
/* Do the reset outside of interrupt context */
adapter->tx_timeout_count++;
schedule_work(&adapter->reset_task);
wr32(IGC_EICS,
(adapter->eims_enable_mask & ~adapter->eims_other));
}
hopefully it will get fixed in freebsd too
I bet that would never happen without a bug report (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279245), which I have now filed with FreeBSD.
FWIW: I talked to Intel support about this and they insisted that it is a problem "that can only be fixed by the board vendor". While I think that this is probably a red herring by some blissfully ignorant support agent, I have read that Asus has actually deployed both a new BIOS and an updated Intel Windows driver for their Z790 Kingpin Board some time ago (https://forums.evga.com/PSA-Intel-I226V-25GbE-on-Raptor-Lake-Motherboards-Has-a-Connection-Drop-Issue-No-Fix-m3595279.aspx). Potentially, if there are PCIe settings that are changed by that BIOS, it could be a fix for the underlying hardware glitch.
It just so happens that Minisforum has taken my former criticism about not providing BIOS updates for their Minisforum and now has uploaded version AHWSA.1.22 dated 03/12/2024 (https://www.minisforum.com/new/support?lang=en#/support/page/download/108), where the old version was AHWSA.1.17 dated 12/14/2023. I have successfully flashed the new BIOS via UEFI via a USB stick (all settings stayed intact), but I cannot test if it changes anything w/r to this problem. Maybe @ccleccio, you want to try?
One can check via "dmidecode" which BIOS version is currently active.
@meyegru thanks E3 for filing that issue upstream.
New developments:
1. Do not bother to try the MS-01 1.22 BIOS, I did it and after 2.5 days, the problem resurfaced.
2. It now confirmed that these adapters actually have an updateable NVM firmware. Under FreeBSD, there is not yet a built-in means to display its current version. For the I225, the NVM firmware updater is contained in the release 29.1.1.1 of the Intel driver package, alas not so for the I226.
3. I have requested the NVM update (current for I226 is 2.25, the bug is likely fixed with 2.22 (https://community.intel.com/t5/Ethernet-Products/Intel-Communication-Intel-Ethernet-Controller-I226-Series-Random/m-p/1542528#M35302)) from Minisforum. We shall see what they do.
P.S.: In the NVM update utility for the Intel 700 series (https://www.intel.de/content/www/de/de/download/18190/non-volatile-memory-nvm-update-utility-for-intel-ethernet-network-adapter-700-series.html), there is a FreeBSD tool which gives the NVM firmware release (in my case, it is 2.17, i.e. not the fixed version >=2.22):
# ./nvmupdate64e -i -l
[00:087:00:00]: Intel(R) Ethernet Controller I226-V
Vendor : 8086
Device : 125C
Subvendor : 8086
Subdevice : 0000
Revision : 4
LAN MAC : 5847CA888888
Alt MAC : 000000000000
SAN MAC : 000000000000
ETrackId : 80000303
SerialNumber : 5847CAFFFF76768D
NVM Version : 2.23(2.17)
PBA : G23456-000
VPD status : Not set
VPD size : 0
NVM update : No config file entry
checksum : Valid
OROM update : No config file entry
CIVD : 0.0.0
EFI : 0.1.4, checksum None
[00:088:00:00]: Intel(R) Ethernet Controller I226-LM
Vendor : 8086
Device : 125B
Subvendor : 8086
Subdevice : 0000
Revision : 4
LAN MAC : 5847CA888889
Alt MAC : 000000000000
SAN MAC : 000000000000
ETrackId : 80000307
SerialNumber : 5847CAFFFF76768E
NVM Version : 2.23(2.17)
PBA : G23456-000
VPD status : Not set
VPD size : 0
NVM update : No config file entry
checksum : Valid
Hello folks,
An update from my side. After about a week I started experiencing odd behaviour, websites would randomly not load. After digging it seemed as though these dropouts were hardware related as there were no indicators of blocks in the firewall logs.
I went into the BIOS and disabled ASPM on both intel i226 NICs. Since doing this I haven't had an issue and receive full 2Gb up and down. Fingers crossed this was the problem (assume it was engaging power states when it shouldn't have).
Hi Jim, thanks for taking the time to respond. Those ASPM settings are indeed well hidden within the settings... I just disabled them and will try again.
It's been about four days since the last connection drop. I checked and I was already running bios 1.22
Just want to report an odd behaviour. I disabled ASPM and had no dropouts since then, but right after the system rebooted from the 24.1.8 update, I could not access it anymore and had to replug the cable.
meyergru: ever hear back from MinisForum? I'm also having issues with these NICs. Dropped in an Intel X550-T2 in the mean time.
No, I have not heard back from them yet. However, now it has been a week without a hitch for me with ASPM off.
Since the problem manifests in an observed "detachment" of the NIC from the machine, as is indicated by the ever-progressing internal counters, PCI problems are likely the culprit, so this really might be it.
That is not to say that no other problems exist with these adapters that could be fixed by an NVM update.
P.S.: I have poked them again, there is still an out-of-office-reply.
FYI: this behaviour is not just related to i226v. i had the same pciex disconnection error on a I210AT (H13SAE-MF supermicro) with proxmox on linux. a bios updated reduced the behaviour which was particularly happening in "idle" moments. restarting the nic driver on linux solved without rebooting or replugging .
i ended to put a broadcom nic in the server.
the i226v in the toptons never gave me this behaviour.
We believe we have this behavior on a DEC3682. Is there any fix around for this appliance?