DEC3862 various ethernet problem

Started by szako82, May 08, 2024, 10:05:27 PM

Previous topic - Next topic
Hi!

I'm very new with OPNsense.
Recently we buy a DEC3862.

We experience a variety of network errors
The configuration is the following:
igc0: wan side with PPPOE (symmetric 200MBit/s)
igc1,2,3 internal networks with 2-2 vlans in each cards.
OpenVPN, IDS, Postfix (ClamAV, Redis, Rspamd), Squid configured.

1. Problem:
PPPOE randomly disconnect and cannot connect back..
I tried:
- igc0 interface down and up not helped
- Reload all services not helped
- manually disconnects LAN cable and connect back not helped
- Reboot the firewall helps.
Here are the PPPOE log:

2024-05-08T19:33:29   Informational   ppp   [wan_link0] Link: reconnection attempt 3   
2024-05-08T19:33:27   Informational   ppp   [wan_link0] Link: reconnection attempt 3 in 2 seconds   
2024-05-08T19:33:27   Informational   ppp   [wan_link0] LCP: Down event   
2024-05-08T19:33:27   Informational   ppp   [wan_link0] Link: DOWN event   
2024-05-08T19:33:27   Informational   ppp   [wan_link0] PPPoE connection timeout after 9 seconds   
2024-05-08T19:33:18   Informational   ppp   [wan_link0] PPPoE: Connecting to ''   
2024-05-08T19:33:18   Informational   ppp   [wan_link0] Link: reconnection attempt 2   
2024-05-08T19:33:16   Informational   ppp   [wan_link0] Link: reconnection attempt 2 in 2 seconds   
2024-05-08T19:33:16   Informational   ppp   [wan_link0] LCP: Down event   
2024-05-08T19:33:16   Informational   ppp   [wan_link0] Link: DOWN event   
2024-05-08T19:33:16   Informational   ppp   [wan_link0] PPPoE: can't connect "[14]:"->"mpd21932-0" and "[8]:"->"left": No such file or directory   
2024-05-08T19:33:16   Informational   ppp   [wan_link0] Link: reconnection attempt 1   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] Link: reconnection attempt 1 in 3 seconds   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] LCP: state change Stopping --> Starting   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] LCP: Down event   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] Link: DOWN event   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] can't remove hook mpd21932-0 from node "[14]:": No such file or directory   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] PPPoE: connection closed   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] LCP: LayerDown   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] LCP: SendTerminateReq #4   
2024-05-08T19:33:13   Informational   ppp   [wan] IPCP: state change Closing --> Initial   
2024-05-08T19:33:13   Informational   ppp   [wan] Bundle: No NCPs left. Closing links...   
2024-05-08T19:33:13   Informational   ppp   [wan] IPCP: LayerFinish   
2024-05-08T19:33:13   Informational   ppp   [wan] IPCP: Down event   
2024-05-08T19:33:13   Informational   ppp   [wan] IFACE: Rename interface pppoe0 to pppoe0   
2024-05-08T19:33:13   Informational   ppp   [wan] IFACE: Down event   
2024-05-08T19:33:13   Informational   ppp   [wan] IPCP: LayerDown   
2024-05-08T19:33:13   Informational   ppp   [wan] IPCP: SendTerminateReq #4   
2024-05-08T19:33:13   Informational   ppp   [wan] IPCP: state change Opened --> Closing   
2024-05-08T19:33:13   Informational   ppp   [wan] IPCP: Close event   
2024-05-08T19:33:13   Informational   ppp   [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] Link: Leave bundle "wan"   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] LCP: state change Opened --> Stopping   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] LCP: peer not responding to echo requests   
2024-05-08T19:33:13   Informational   ppp   [wan_link0] LCP: no reply to 5 echo request(s)   
2024-05-08T19:33:03   Informational   ppp   [wan_link0] LCP: no reply to 4 echo request(s)   
2024-05-08T19:32:53   Informational   ppp   [wan_link0] LCP: no reply to 3 echo request(s)   
2024-05-08T19:32:43   Informational   ppp   [wan_link0] LCP: no reply to 2 echo request(s)   
2024-05-08T19:32:33   Informational   ppp   [wan_link0] LCP: no reply to 1 echo request(s)   
2024-05-08T09:56:36   Informational   ppp   [wan] IFACE: Rename interface ng0 to pppoe0   
2024-05-08T09:56:36   Informational   ppp   [wan] IFACE: Up event   

2. Problem
Randomly NICs dropping the traffic.
- In the switch log (where the Firewall is connected) i saw link down and up event before this happening.
- NIC state showing state UP and link active however the cable is disconnected.

I tried:
- ifconfig igc down and up not helped.
- hw.pci.enable_msix set to 0 not helped
- Reboot the firewall helps.

Now I set hw.igc.eee_setting to 0 and waiting to the results.

Are there any suggestions, what should I check what should I change?

Best regards,
  Laszlo Szakovics


May 10, 2024, 09:42:04 AM #1 Last Edit: May 10, 2024, 09:53:07 AM by tom.goes.open
Hi,

just wondering which NICs are used in a DEC3862, but as you refer to igc, I assume Intel i225/i226?
I am facing the same problems, but I am using a Shuttle DL30N with 2x i226-LM. Doing some searches on the internet, you find a lot of problems related to those NICs, I'm afraid. It looks like not much can be done at the moment.

Some settings you might try:
dev.igc.X.fc = 0
"Speed and duplex" whatever your modem uses instead of "default"

Oh and just one note to hw.igc.eee_setting = 0 will enable EEE, have look at https://man.freebsd.org/cgi/man.cgi?query=igc:
Quotehw.igc.eee_setting Disable or enable Energy Efficient Ethernet.  Default  1 (dis-abled).

Have you seen https://forum.opnsense.org/index.php?topic=38055.0?

Just for information my log entries when the link goes down:
2024-05-09T20:32:12 Error opnsense /usr/local/etc/rc.newwanip: The command '/sbin/route add -host -'inet6' '***' '***%pppoe0'' returned exit code '1', the output was 'route: writing to routing socket: Network is unreachable add host ***: gateway ***8%pppoe0 fib 0: Network is unreachable'


Hi!

There are 4 of
<Intel(R) Ethernet Controller I225-V> mem 0x80b00000-0x80bfffff,0x80c00000                                                                                                                                                            -0x80c03fff at device 0.0 on pci2
Using 1024 TX descriptors and 1024 RX descriptors
Using 4 RX queues 4 TX queues
Using MSI-X interrupts with 5 vectors
Ethernet address: f4:90:ea:00:ec:4c
netmap queues/slots: TX 4/1024, RX 4/1024


Unfortunately I cannot check the chip revision because there is a warranty void label on the screws.

Thanks, I missed the information that "hw.igc.eee_setting = 0 will enable" the energy efficiency mode.
But at least I tried and it went even worse than before. A lot of package dropped on all interfaces.

All links are just 1G but there is no Half duplex mode as I know.

You can check the chip revision via pciconf -lv, giving something like:


igc2@pci0:4:0:0:        class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f3 subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I225-V'
    class      = network
    subclass   = ethernet


I found the I225-V revision 3 to be fine, reportedly they still have problems with some counterparts. I now have a machine with an I226-V revision 4 which counts media errors , which in my case do not directly cause degradation of the speed, but after a few hours of operation, the link stalls and can only be revived via a port reset.

This is reported also on pfSense forums and here. So it seems that the very chip that Intel created as a stopgap for the problematic I225 is even more problematic itself.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Hi!

I managet to get the hw info by pciconf -lv

All network related stuffs are here:
igc0@pci0:2:0:0:        class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f3 subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I225-V'
    class      = network
    subclass   = ethernet
igc1@pci0:3:0:0:        class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f3 subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I225-V'
    class      = network
    subclass   = ethernet
igc2@pci0:4:0:0:        class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f3 subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I225-V'
    class      = network
    subclass   = ethernet
igc3@pci0:5:0:0:        class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f3 subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller I225-V'
    class      = network
    subclass   = ethernet
ax0@pci0:7:0:4: class=0x020000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1458 subvendor=0x1022 subdevice=0x1458
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    class      = network
    subclass   = ethernet
ax1@pci0:7:0:5: class=0x020000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1458 subvendor=0x1022 subdevice=0x1458
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    class      = network
    subclass   = ethernet

The problems are still there but for the last week not happened again till today.
But today happened two times :(

So it seems that the I225 is also riddled with this problem. I thought it was just the I226 - at least I never saw it on the I225-V rev.3.

See this for a potential explanation.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

I would expect that Deciso is aware of the issues with the Intel Nic and the DEC3862 and should provide fix or replacement?

At least for my hardware, the Minisforum MS-01, it turned out to be a powersave setting that had to be disabled (ASPM). The unit uses I226 type adapters.

Maybe this is something that could be fixed in the BIOS or via specific settings.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Well, it looks like you are dealing with the same problem. Try the following:
https://forum.opnsense.org/index.php?topic=43968.0

Has this ever been solved?

I weirdly enough seem to experience the same.

I own a Shuttle DL30N with those intel i-226-x ethernet controllers.
Luckily in August Shuttle uploaded a new BIOS with the upstream fix from intel, which solved all my issues, but since around, guessed 1 month I experience the same behaviour again, without really modifying anything except doing opnsense updates.