DEC3862 network interfaces outages

Started by netuser, November 12, 2024, 11:44:06 AM

Previous topic - Next topic
November 12, 2024, 11:44:06 AM Last Edit: November 12, 2024, 11:45:40 AM by netuser
I have owned a DEC3862 device for about half a year. However, after deploying the role as a new firewall to the production environment, I am experiencing random network interfaces outages. The problematic network interface pretends to be fine, but the gateway or network behind it is unavailable. The ifdown and ifup commands will not help. Also, the "Reload all services" function does not solve anything. The only thing that helps is restarting the operating system. Outages occur after a few days. My configuration is DUALWAN with dualstack IPv4/Ipv6 protocol. All network cards are Intel(R) Ethernet Controller I225-V. The firewall is pre-installed with the OPNsense 24.4 operating system and with the default configuration set by the manufacturer.

I would also like to add that after the outage, from the point of view of the opposite party (switch), the firewall connects with full duplex and at a lower speed of 10 Mbit/s, but without communication. Or there is also a "port flapping" effect.
The previous firewall, running on Debian GNU Linux OS and with Lenovo hardware, did not suffer from the above symptoms. I've been running a small firewall with OPNsense and PCEngines APU2 hardware for several years without a problem.
I'm disappointed that they install common Intel(R) Ethernet Controller I225-V network cards, which are known for their problems, in devices with "business and enterprise" class ambitions. >:(

I hoped that after upgrade to 24.10 with potentially new igc driver's would be not problems with network interface falls, but falls are more often.
These are default settings from vendort:
# sysctl -a | grep hw.igc
hw.igc.max_interrupt_rate: 20000
hw.igc.eee_setting: 1
hw.igc.sbp: 1
hw.igc.smart_pwr_down: 0
hw.igc.rx_abs_int_delay: 66
hw.igc.tx_abs_int_delay: 66
hw.igc.rx_int_delay: 0
hw.igc.tx_int_delay: 66
hw.igc.disable_crc_stripping: 0


BUT according https://man.freebsd.org/cgi/man.cgi?query=igc default values are different:
hw.igc.igc_disable_crc_stripping
          Disable   or  enable  hardware  stripping   of CRC field.  This is
          mostly useful on   BMC/IPMI shared   interfaces where stripping the
          CRC causes remote access   over IPMI to  fail.   Default  0  (en-
          abled).

       hw.igc.rx_int_delay
          This value delays the generation   of receive interrupts in units
          of  1.024 microseconds.   The default value is 0,   since adapters
          may hang   with this feature being   enabled.

       hw.igc.rx_abs_int_delay
          If hw.igc.rx_int_delay is non-zero,  this  tunable  limits  the
          maximum delay in   which a   receive   interrupt is generated.

       hw.igc.tx_int_delay
          This  value  delays  the    generation  of   transmit interrupts in
          units of   1.024 microseconds.  The default value is 64.

       hw.igc.tx_abs_int_delay
          If hw.igc.tx_int_delay is non-zero,  this  tunable  limits  the
          maximum delay in   which a   transmit interrupt is generated.

       hw.igc.sbp
          Show bad   packets   when in   promiscuous mode.  Default is false.

       hw.igc.eee_setting
          Disable   or  enable Energy Efficient Ethernet.  Default 1 (dis-
          abled).

       hw.igc.max_interrupt_rate
          Maximum device interrupts per second.  The default is 8000.

Did you ever fix this? Also have issues with DEC3862 here and Nics getting "stalled"....

First, I solved what is the cause and what is the effect. Well, the consequence was obvious, the network card ended up in a non-functional state of "flipping" or 10Mbit half duplex. At the very beginning of the problems, I tried to upgrade the BIOS of the motherboard from June this year, unfortunately there is no description of the changes - a big minus for Decisio. After the BIOS upgrade, the system ran without problems for about 3 weeks. Then the outages stabilized again for a few days. After upgrading the system to version 24.10, outages occurred every working day. I had to dig deeper into FreeBSD to solve the problem. In the first place, I set or checked the configuration of Intel i225-V NIC drivers. So I turned off the Flow Control(FC) and checked the EEE to see if it was off. Then I started tuning the performance of the network cards.
I did it according to the instructions:
https://forum.opnsense.org/index.php?topic=6590.0

My current tunables:
sysctl dev.igc.X.fc
dev.igc.X.fc: 0
sysctl hw.igc.eee_setting
hw.igc.eee_setting: 1
sysctl hw.igc.max_interrupt_rate
hw.igc.max_interrupt_rate: 32000
sysctl hw.igc.smart_pwr_down
hw.igc.smart_pwr_down: 0
sysctl net.link.ifqmaxlen
net.link.ifqmaxlen: 2048
sysctl net.inet.tcp.soreceive_stream
net.inet.tcp.soreceive_stream: 1

Quote from: netuser on December 06, 2024, 01:21:58 PMMy current tunables:
sysctl dev.igc.X.fc
dev.igc.X.fc: 0
sysctl hw.igc.eee_setting
hw.igc.eee_setting: 1
sysctl hw.igc.max_interrupt_rate
hw.igc.max_interrupt_rate: 32000
sysctl hw.igc.smart_pwr_down
hw.igc.smart_pwr_down: 0
sysctl net.link.ifqmaxlen
net.link.ifqmaxlen: 2048
sysctl net.inet.tcp.soreceive_stream
net.inet.tcp.soreceive_stream: 1

Thank you very much for sharing this. We will try with this settings as our issues are still ongoing.