1
22.1 Legacy Series / High Packet Loss/CRC errors for Tx on axgbe (DEC2750/OPNsense 22.1)
« on: April 07, 2022, 07:10:50 am »
I am experiencing high-packet loss when transmitting from ax0 (LAN) to another LAN device on my DEC2750 running OPNsense 22.1. I have poor throughput in both directions (~1.8Gbps as sender, ~1.7Gbps as receiver) however, I'm only observing packet loss/retx when ax0 is the transmitter.
On my DEC2750 the LAN is ax0 and it is connected to port 8 of a USW-Aggregation 10Gbps switch via a Mellanox MCP2100-X003B DAC. Looking at the switch port I can see the input errors and CRC counts increasing when I run iperf.
I don't see any errors/discards at the fw LAN interface (DEC2750 ax0). MTU is 1500 all around.
I have verified:
- I can send 9.4Gbps bi-directionally between all other devices connected to the USW-Aggregation switch
- Switch CPU utilization is low (~3-5%)
- iperf3 -u -b 9000M (UDP) shows the same bandwidth and packet loss behavior
Additionally, I verified back in January on OPNsense 21.7 that I could bi-directionarlly push 9.4Gbps on the LAN interface to other 10Gbe devices (and well in excess of 5Gbps across the FW and out ax1).
I have tried:
- Rebooting DEC2750 (no change)
- Rebooting the switch (no change)
- Switching to a known good DAC (no change)
- Put the original DAC used by the FW on another known-good host (no change - the known good can hit 9.4Gbps without issue)
- Change port on the switch (no change)
- Switch to ax1 on the DEC2750 (no change)
- enabling hardware checksum offloading on fw (no change)
- enabling hardware tcp segmentation offloading on fw (no change)
- enabling large receive offload on fw (no change)
- enabling flow control on the switch (no change in throughput but it does completely eliminate the iperf3 TCP ReTxs)
- enabling flow control on ax0 (add tunable for dev.ax.0.rx_pause 1 and dev.ax_0.tx_pause 1 then reboot) (no chnage in throughput but eliminates iperf3 TCP ReTxs)
I have not yet tried:
-Direct connecting the FW to another 10Gbps port device Update See post below on this
- Downgrading to OPNSense 21.x
-Using a verified tested & working DAC module (e.g. [DAC] UBIQUITI 10G 1M DAC) Update - Arrived and installed, no change
The only thing that I know of that has changed is the update to OPNsense 22.1 (which bases on FreeBSD 13 vs 21.x which was on FreeBSD 12). Could this be a potential issue with OPSense 22.1/FreeBSD 13 and axgbe?
Hardware: DEC2750
Software: OPNsense 22.1.4_1-amd64
These are the potentially relevant modified tunables I received "out-of-the-box" when delivered from Deciso:
On my DEC2750 the LAN is ax0 and it is connected to port 8 of a USW-Aggregation 10Gbps switch via a Mellanox MCP2100-X003B DAC. Looking at the switch port I can see the input errors and CRC counts increasing when I run iperf.
Code: [Select]
root@fw:~ # iperf3 -c 172.16.5.14
Connecting to host 172.16.5.14, port 5201
[ 5] local 172.16.5.1 port 29519 connected to 172.16.5.14 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 180 MBytes 1.51 Gbits/sec 390 29.8 KBytes
[ 5] 1.00-2.00 sec 122 MBytes 1.03 Gbits/sec 252 49.8 KBytes
[ 5] 2.00-3.00 sec 259 MBytes 2.17 Gbits/sec 508 54.1 KBytes
[ 5] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec 529 25.5 KBytes
[ 5] 4.00-5.01 sec 134 MBytes 1.12 Gbits/sec 298 334 KBytes
[ 5] 5.01-6.01 sec 192 MBytes 1.61 Gbits/sec 397 781 KBytes
[ 5] 6.01-7.00 sec 218 MBytes 1.84 Gbits/sec 434 48.3 KBytes
[ 5] 7.00-8.00 sec 117 MBytes 983 Mbits/sec 242 19.9 KBytes
[ 5] 8.00-9.00 sec 176 MBytes 1.48 Gbits/sec 326 22.7 KBytes
[ 5] 9.00-10.00 sec 215 MBytes 1.81 Gbits/sec 435 44.0 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.82 GBytes 1.57 Gbits/sec 3811 sender
[ 5] 0.00-10.00 sec 1.82 GBytes 1.57 Gbits/sec receiver
Code: [Select]
SW-Aggregation# show interfaces TenGigabitEthernet 8
TenGigabitEthernet8 is up
Hardware is Ten Gigabit Ethernet
Full-duplex, 10Gb/s, media type is Fiber
flow-control is off
back-pressure is enabled
262840538 packets input, 865223445 bytes, 0 throttles
Received 2488 broadcasts (0 multicasts)
0 runts, 477 giants, 0 throttles
510220 input errors, 509743 CRC, 0 frame
0 multicast, 0 pause input
0 input packets with dribble condition detected
156613060 packets output, 1602945509 bytes, 0 underrun
644 output errors, 0 collisions
644 babbles, 0 late collision, 0 deferred
0 PAUSE output
I don't see any errors/discards at the fw LAN interface (DEC2750 ax0). MTU is 1500 all around.
Code: [Select]
root@fw:~ # ifconfig ax0
ax0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: LAN
options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
ether f4:90:ea:00:73:4a
inet 172.16.5.1 netmask 0xffffff00 broadcast 172.16.5.255
media: Ethernet autoselect (10GBase-SFI <full-duplex,rxpause,txpause>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@fw:~ # netstat -i log | grep -iE "Name|ax0"
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
ax0 1500 <Link#4> f4:90:ea:00:73:4a 9077437 0 0 12499343 0 0
ax0 - 172.16.5.0/24 fw 6778 - - 12065 - -
I have verified:
- I can send 9.4Gbps bi-directionally between all other devices connected to the USW-Aggregation switch
- Switch CPU utilization is low (~3-5%)
- iperf3 -u -b 9000M (UDP) shows the same bandwidth and packet loss behavior
Additionally, I verified back in January on OPNsense 21.7 that I could bi-directionarlly push 9.4Gbps on the LAN interface to other 10Gbe devices (and well in excess of 5Gbps across the FW and out ax1).
I have tried:
- Rebooting DEC2750 (no change)
- Rebooting the switch (no change)
- Switching to a known good DAC (no change)
- Put the original DAC used by the FW on another known-good host (no change - the known good can hit 9.4Gbps without issue)
- Change port on the switch (no change)
- Switch to ax1 on the DEC2750 (no change)
- enabling hardware checksum offloading on fw (no change)
- enabling hardware tcp segmentation offloading on fw (no change)
- enabling large receive offload on fw (no change)
- enabling flow control on the switch (no change in throughput but it does completely eliminate the iperf3 TCP ReTxs)
- enabling flow control on ax0 (add tunable for dev.ax.0.rx_pause 1 and dev.ax_0.tx_pause 1 then reboot) (no chnage in throughput but eliminates iperf3 TCP ReTxs)
I have not yet tried:
-
- Downgrading to OPNSense 21.x
-
The only thing that I know of that has changed is the update to OPNsense 22.1 (which bases on FreeBSD 13 vs 21.x which was on FreeBSD 12). Could this be a potential issue with OPSense 22.1/FreeBSD 13 and axgbe?
Hardware: DEC2750
Software: OPNsense 22.1.4_1-amd64
Code: [Select]
$ uname -a FreeBSD fw 13.0-STABLE FreeBSD 13.0-STABLE stable/22.1-n248063-ac40e064d3c SMP amd64
$ dmesg | grep -i ax0
ax0: <AMD 10 Gigabit Ethernet Driver> mem 0xd0060000-0xd007ffff,0xd0040000-0xd005ffff,0xd0082000-0xd0083fff at device 0.1 on pci6
ax0: Using 2048 TX descriptors and 2048 RX descriptors
ax0: Using 3 RX queues 3 TX queues
ax0: Using MSI-X interrupts with 7 vectors
ax0: Ethernet address: f4:90:ea:00:73:4a
ax0: xgbe_config_sph_mode: SPH disabled in channel 0
ax0: xgbe_config_sph_mode: SPH disabled in channel 1
ax0: xgbe_config_sph_mode: SPH disabled in channel 2
ax0: RSS Enabled
ax0: Receive checksum offload Enabled
ax0: VLAN filtering Enabled
ax0: VLAN Stripping Enabled
ax0: Checking GPIO expander validity
ax0: SFP detected:
ax0: vendor: Mellanox
ax0: part number: MCP2100-X003B
ax0: revision level: A1
ax0: serial number: MT1403VS18803
ax0: netmap queues/slots: TX 3/2048, RX 3/2048
These are the potentially relevant modified tunables I received "out-of-the-box" when delivered from Deciso:
Code: [Select]
dev.ax.0.iflib.override_nrxds 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048
dev.ax.0.iflib.override_ntxds 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048
dev.ax.0.rss_enabled 1
dev.ax.1.iflib.override_nrxds 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048
dev.ax.1.iflib.override_ntxds 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048
dev.ax.1.rss_enabled 1