No internet on LAN, no webgui or ssh access from LAN devices

Started by ccleccio, May 19, 2024, 01:24:30 PM

Previous topic - Next topic
Hi everyone,
first time user here so i'll try to explain the problem as best as i can.

My setup goes as follows: dsl line>dlink dva-5592 pure bridge>PPPoE wan MS-01 opnsense>5port netgear dumb switch>devices

At random the connection to the internet from lan devices drops randomly, i also cannot access opnsense from either the webgui or ssh.

The problem is seemingly(or temporarily at least) fixed by unplugging and replugging the LAN cable from the switch to the opnsense machine.

Please let me know if you need any more info

If by MS-01 you mean a Minisforum MS-01, I have exactly the same problem, but actually on the WAN side which is connected from the I226-V to an ONT that is capable of 2.5 Gbps. The connection goes down around once per day and can only be reanimated by a port reset or replugging the cable.

This seems to be a problem with I226 more than with I226 chips and happens more often with 2.5 Gbps than with 1 Gbps. See also my comment and Jim's answer here: https://www.youtube.com/watch?v=_wgX1sDab-M , it seems that he has no problems, but he uses OpnSense under Proxmox.

There are reports of this all over the internet, see this. However, I get the impression that for Windows and Linux, it has been (at least partly) fixed in the driver, potentially by doing a soft reset when the error is detected. I suspect that the FreeBSD drivers do not have the same fixes yet.

Do you use the I226-V or the I226-LM (the one which also supports AMT)?
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Thanks meyergru for the response.

Indeed I mean the Minisforum ms-01. I am using it with a 1gpbs connection so I suppose the problem is not exclusive to 2.5gb.

I am not sure which one is being used between the V and LM, how would I check that?

My possible solutions at the moment would be to either use sfp to rj45 modules with the x710(if it even supports 1gbps connection) or to install a working gigabit pcie nic(which one do you suggest?), is it correct?

Thanks

You can check with "pciconf -lv", which gives you this:


igc0@pci0:87:0:0:       class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I226-V'
    class      = network
    subclass   = ethernet
igc1@pci0:88:0:0:       class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125b subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I226-LM'
    class      = network
    subclass   = ethernet


As you can see, igc0 is the I226-V, igc1 is the I226-LM. I only tested igc0. Besides, the ports are strangely numbered, I think ixl0 is on the left but igc0 is on the right or vice versa...

I advise against using SFP+ sticks for Ethernet, they get awfully hot. You could use a NIC in the PCIE slot, do not use a Realtek 8125, but an Intel I225V-based adapter (as I said, they are less problematic than their successors).

I currently use only ixl0 with a DAC connection to my switch, it has some VLANs anyway and I have my ONT attached to a port on my switch (which is capable of 2.5 Gbps as well).
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

I am using the I226-V for the LAN and the I226-LM for the PPPoE WAN. I noticed another behaviour of the modem, it shuts down the dsl connection(bridge too as a result), I wonder if its because it loses connection to the LM.

Even if you switch both NICs, I bet you would have the LAN connection loss, because as I said, IDK about the I226-LM, but the I226-V definitely has these problems. And now we can tell that it is not limited to 2.5 Gbps connections.

Since Jim form Jim's Garage says he is fine, it seems that those problems may have been fixed in the Linux drivers. Interestingly enough, there are no Intel OEM drivers for FreeBSD for either I225 or I226 in their current Intel Network Driver package 29.1.

If you do not need 2,5 Gbps, you could also choose a 2-port Intel I210 adapter. For 2.5 Gbps, there is also the Aquantia and Marvell types, but IDK if they are supported under FreeBSD.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Quote from: meyergru on May 20, 2024, 02:36:48 PM
Since Jim form Jim's Garage says he is fine, it seems that those problems may have been fixed in the Linux drivers. Interestingly enough, there are no Intel OEM drivers for FreeBSD for either I225 or I226 in their current Intel Network Driver package 29.1.

Seems I was right (take it with a grain of salt as I did no deep dive):

https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/igc/igc_main.c , starting at line 3150:


if (test_bit(IGC_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
struct igc_hw *hw = &adapter->hw;

/* Detect a transmit hang in hardware, this serializes the
* check with the clearing of time_stamp and movement of i
*/
clear_bit(IGC_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags);
if (tx_buffer->next_to_watch &&
    time_after(jiffies, tx_buffer->time_stamp +
    (adapter->tx_timeout_factor * HZ)) &&
    !(rd32(IGC_STATUS) & IGC_STATUS_TXOFF) &&
    (rd32(IGC_TDH(tx_ring->reg_idx)) != readl(tx_ring->tail)) &&
    !tx_ring->oper_gate_closed) {
/* detected Tx unit hang */
netdev_err(tx_ring->netdev,
   "Detected Tx Unit Hang\n"
   "  Tx Queue             <%d>\n"
   "  TDH                  <%x>\n"
   "  TDT                  <%x>\n"
   "  next_to_use          <%x>\n"
   "  next_to_clean        <%x>\n"
   "buffer_info[next_to_clean]\n"
   "  time_stamp           <%lx>\n"
   "  next_to_watch        <%p>\n"
   "  jiffies              <%lx>\n"
   "  desc.status          <%x>\n",
   tx_ring->queue_index,
   rd32(IGC_TDH(tx_ring->reg_idx)),
   readl(tx_ring->tail),
   tx_ring->next_to_use,
   tx_ring->next_to_clean,
   tx_buffer->time_stamp,
   tx_buffer->next_to_watch,
   jiffies,
   tx_buffer->next_to_watch->wb.status);
netif_stop_subqueue(tx_ring->netdev,
    tx_ring->queue_index);

/* we are about to reset, no point in enabling stuff */
return true;
}
}


This section detects a TX queue hang after a timeout and then resets the adapter. I found nothing to this extent in the FreeBSD igc driver. Also, there is nothing comparable to this part from the Linux igc driver:


/**
* igc_tx_timeout - Respond to a Tx Hang
* @netdev: network interface device structure
* @txqueue: queue number that timed out
**/
static void igc_tx_timeout(struct net_device *netdev,
   unsigned int __always_unused txqueue)
{
struct igc_adapter *adapter = netdev_priv(netdev);
struct igc_hw *hw = &adapter->hw;

/* Do the reset outside of interrupt context */
adapter->tx_timeout_count++;
schedule_work(&adapter->reset_task);
wr32(IGC_EICS,
     (adapter->eims_enable_mask & ~adapter->eims_other));
}

Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A


I bet that would never happen without a bug report, which I have now filed with FreeBSD.

FWIW: I talked to Intel support about this and they insisted that it is a problem "that can only be fixed by the board vendor". While I think that this is probably a red herring by some blissfully ignorant support agent, I have read that Asus has actually deployed both a new BIOS and an updated Intel Windows driver for their Z790 Kingpin Board some time ago. Potentially, if there are PCIe settings that are changed by that BIOS, it could be a fix for the underlying hardware glitch.

It just so happens that Minisforum has taken my former criticism about not providing BIOS updates for their Minisforum and now has uploaded version AHWSA.1.22 dated 03/12/2024, where the old version was AHWSA.1.17 dated 12/14/2023. I have successfully flashed the new BIOS via UEFI via a USB stick (all settings stayed intact), but I cannot test if it changes anything w/r to this problem. Maybe @ccleccio, you want to try?

One can check via "dmidecode" which BIOS version is currently active.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

@meyegru thanks E3 for filing that issue upstream.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

New developments:

1. Do not bother to try the MS-01 1.22 BIOS, I did it and after 2.5 days, the problem resurfaced.

2. It now confirmed that these adapters actually have an updateable NVM firmware. Under FreeBSD, there is not yet a built-in means to display its current version. For the I225, the NVM firmware updater is contained in the release 29.1.1.1 of the Intel driver package, alas not so for the I226.

3. I have requested the NVM update (current for I226 is 2.25, the bug is likely fixed with 2.22) from Minisforum. We shall see what they do.

P.S.: In the NVM update utility for the Intel 700 series, there is a FreeBSD tool which gives the NVM firmware release (in my case, it is 2.17, i.e. not the fixed version >=2.22):

# ./nvmupdate64e -i -l
[00:087:00:00]: Intel(R) Ethernet Controller I226-V
        Vendor                 : 8086
        Device                 : 125C
        Subvendor              : 8086
        Subdevice              : 0000
        Revision               : 4
        LAN MAC                : 5847CA888888
        Alt MAC                : 000000000000
        SAN MAC                : 000000000000
        ETrackId               : 80000303
        SerialNumber           : 5847CAFFFF76768D
        NVM Version            : 2.23(2.17)
        PBA                    : G23456-000
        VPD status             : Not set
        VPD size               : 0
        NVM update             : No config file entry
          checksum             : Valid
        OROM update            : No config file entry
          CIVD                 : 0.0.0
          EFI                  : 0.1.4, checksum None
[00:088:00:00]: Intel(R) Ethernet Controller I226-LM
        Vendor                 : 8086
        Device                 : 125B
        Subvendor              : 8086
        Subdevice              : 0000
        Revision               : 4
        LAN MAC                : 5847CA888889
        Alt MAC                : 000000000000
        SAN MAC                : 000000000000
        ETrackId               : 80000307
        SerialNumber           : 5847CAFFFF76768E
        NVM Version            : 2.23(2.17)
        PBA                    : G23456-000
        VPD status             : Not set
        VPD size               : 0
        NVM update             : No config file entry
          checksum             : Valid
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Hello folks,

An update from my side. After about a week I started experiencing odd behaviour, websites would randomly not load. After digging it seemed as though these dropouts were hardware related as there were no indicators of blocks in the firewall logs.

I went into the BIOS and disabled ASPM on both intel i226 NICs. Since doing this I haven't had an issue and receive full 2Gb up and down. Fingers crossed this was the problem (assume it was engaging power states when it shouldn't have).

Hi Jim, thanks for taking the time to respond. Those ASPM settings are indeed well hidden within the settings... I just disabled them and will try again.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

It's been about four days since the last connection drop. I checked and I was already running bios 1.22

Just want to report an odd behaviour. I disabled ASPM and had no dropouts since then, but right after the system rebooted from the 24.1.8 update, I could not access it anymore and had to replug the cable.