DEC4280 - low speed on multi-gig interfaces

Started by l1lz, July 17, 2025, 05:30:31 PM

Previous topic - Next topic
Hi,

I'm using DEC4280 appliances from Deciso with OPNsense preinstalled. The goal is to use them in HA with two link aggregations: two SFP28 interfaces for the internal VLANs and two SFP+ interfaces for the uplink.

Unfortunately during the tests, it was difficult to obtain multi-gigabit speeds. So I'm back using one appliance with a very basic configuration to perform tests:

  • igc0 (assigned to LAN): just for management purpose
  • ice0 (assigned opt1): plugged to a switch in access mode on a VLAN
  • ice1 (assigned opt2): plugged to a switch in access mode on another VLAN

On interfaces opt1 and opt2, only one rule allow all traffic to pass. There is only one machine in each VLAN. With iperf3, I got this result:

# iperf3 -c 10.2.2.12 -p 5201
Connecting to host 10.2.2.12, port 5201
[  5] local 10.1.1.11 port 38024 connected to 10.2.2.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   129 MBytes  1.08 Gbits/sec   52   1.17 MBytes
[  5]   1.00-2.00   sec   125 MBytes  1.05 Gbits/sec    0   1.28 MBytes
[  5]   2.00-3.00   sec   126 MBytes  1.06 Gbits/sec    0   1.36 MBytes
[  5]   3.00-4.00   sec   126 MBytes  1.06 Gbits/sec    0   1.43 MBytes
[  5]   4.00-5.00   sec   126 MBytes  1.06 Gbits/sec    1   1.08 MBytes
[  5]   5.00-6.00   sec   125 MBytes  1.05 Gbits/sec    0   1.16 MBytes
[  5]   6.00-7.00   sec   126 MBytes  1.06 Gbits/sec    0   1.24 MBytes
[  5]   7.00-8.00   sec   126 MBytes  1.06 Gbits/sec    0   1.31 MBytes
[  5]   8.00-9.00   sec   126 MBytes  1.06 Gbits/sec    0   1.38 MBytes
[  5]   9.00-10.00  sec   125 MBytes  1.05 Gbits/sec    3   1.02 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.23 GBytes  1.06 Gbits/sec   56             sender
[  5]   0.00-10.00  sec  1.23 GBytes  1.06 Gbits/sec                  receiver

iperf Done.

But if I plug the two machines directly on the same VLAN, bypassing the firewall, I got this result:

# iperf3 -c 10.1.1.12 -p 5201
Connecting to host 10.1.1.12, port 5201
[  5] local 10.1.1.11 port 40454 connected to 10.1.1.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.63 GBytes  22.6 Gbits/sec   61   2.53 MBytes
[  5]   1.00-2.00   sec  2.66 GBytes  22.8 Gbits/sec    0   2.91 MBytes
[  5]   2.00-3.00   sec  2.66 GBytes  22.9 Gbits/sec    0   2.96 MBytes
[  5]   3.00-4.00   sec  2.68 GBytes  23.0 Gbits/sec    0   3.14 MBytes
[  5]   4.00-5.00   sec  2.52 GBytes  21.6 Gbits/sec   47   2.43 MBytes
[  5]   5.00-6.00   sec  2.66 GBytes  22.8 Gbits/sec    0   2.48 MBytes
[  5]   6.00-7.00   sec  2.66 GBytes  22.8 Gbits/sec    0   2.55 MBytes
[  5]   7.00-8.00   sec  2.69 GBytes  23.1 Gbits/sec    0   2.55 MBytes
[  5]   8.00-9.00   sec  2.67 GBytes  22.9 Gbits/sec    0   2.66 MBytes
[  5]   9.00-10.00  sec  2.66 GBytes  22.9 Gbits/sec    0   2.66 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  26.5 GBytes  22.7 Gbits/sec  108             sender
[  5]   0.00-10.00  sec  26.5 GBytes  22.7 Gbits/sec                  receiver

Increasing the number of parallel streams increased a bit the bandwidth but we are far from 10Gb/s (and we are using 25Gb/s interfaces). The only time I reached near 10Gb/s was with firewall disabled and a lot of parallel streams.

I tried multiple tunable settings found on this forum, tried to enable/disable HW offloading and I also tried to use the SFP+ interfaces, but same results.

The interfaces looks healthy and the counters for errors and collisions are at 0.

I'm running out of ideas to troubleshoot this. Anyone got similar issues?

Thanks.

The speed is so ridiculously close to exactly 1 Gbit/s that I would believe the negotiated link speed is at 1 GBit/s for one or both interfaces. Did you inspect the settings? With these types of interfaces, often times it is neccessary to manually set link speeds.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote from: l1lz on July 17, 2025, 05:30:31 PM[...]
I tried multiple tunable settings found on this forum [...]

So long as you're not terminating iperf sessions on the firewall, about the only tunables/sysctls that'll help you are RSS: net.isr.bindthreads, net.isr.maxthreads, net.inet.rss.bits, net.inet.rss.enabled. Have a look at "netstat -Q" (IIRC) to check. I believe OPNsense sets most other necessary sysctls reasonably. But I believe RSS is mainly good for throughput.  How does CPU utilization look while running the tests?

If you weren't using a Deciso device, I'd recommend looking at the ice devices, e.g. "dmesg | grep ice" and "pciconf -lvV ice0" for any oddities (mine really wanted to set up as x4 v3; you want x4 v4 or x8 v3 at a minimum).

That speed does look suspicious, like a per-flow/session shaper.

Quote from: meyergru on July 17, 2025, 06:47:25 PMThe speed is so ridiculously close to exactly 1 Gbit/s that I would believe the negotiated link speed is at 1 GBit/s for one or both interfaces.

That was also my first thought, but they negotiate 25G, and if I use parallel streams it can get up to 5 Gb/s. But a single flow seems capped at around 1 Gb/s. I also changed settings about error corrections and flow control, but it has no effect.

ice0: Link is up, 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg: False, Flow Control: None
ice0: link state changed to UP
ice1: Link is up, 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg: False, Flow Control: None
ice1: link state changed to UP


Quote from: pfry on July 17, 2025, 10:19:43 PMSo long as you're not terminating iperf sessions on the firewall, about the only tunables/sysctls that'll help you are RSS: net.isr.bindthreads, net.isr.maxthreads, net.inet.rss.bits, net.inet.rss.enabled. Have a look at "netstat -Q" (IIRC) to check. I believe OPNsense sets most other necessary sysctls reasonably. But I believe RSS is mainly good for throughput.  How does CPU utilization look while running the tests?

I just reverted all the tunables to the default config that came with the device and I enabled this:

  • net.isr.bindthreads = 1
  • net.isr.maxthreads = -1
  • net.inet.rss.enabled = 1
  • net.inet.rss.bits = 4

After reboot, results are the same. The command netstat -Q shows a lot of workstreams now. If I use iperf with the option -P 16, I get up to 4.3 Gbit/s and a single stream is around 1.5Gbit/s. During transfer with 16 streams, some CPU cores are indeed used at 100%, global usage is: CPU:  0.0% user,  0.0% nice, 23.6% system, 30.0% interrupt, 46.4% idle.

I noticed an error regarding the ice interfaces while running sysctl command, do you think it's related?

root@OPNsense:~ # sysctl -a | grep rss
ice0: ice_add_rss_cfg on VSI 0 could not configure every requested hash type
ice1: ice_add_rss_cfg on VSI 0 could not configure every requested hash type
ice2: ice_add_rss_cfg on VSI 0 could not configure every requested hash type
ice3: ice_add_rss_cfg on VSI 0 could not configure every requested hash type
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:12 13:13 14:14 15:15
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 16
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 32
net.inet.rss.maxbits: 7
net.inet.rss.mask: 15
net.inet.rss.bits: 4
net.inet.rss.hashalgo: 2
hw.bxe.udp_rss: 0
hw.ix.enable_rss: 1
dev.ax.3.rss_enabled: 1
dev.ax.2.rss_enabled: 1
dev.ax.1.rss_enabled: 1
dev.ax.0.rss_enabled: 1

root@OPNsense:~ # dmesg | grep vectors
igc0: Using MSI-X interrupts with 5 vectors
igc1: Using MSI-X interrupts with 5 vectors
igc2: Using MSI-X interrupts with 5 vectors
igc3: Using MSI-X interrupts with 5 vectors
ice0: Using MSI-X interrupts with 33 vectors
ice1: Using MSI-X interrupts with 33 vectors
ice2: Using MSI-X interrupts with 33 vectors
ice3: Using MSI-X interrupts with 33 vectors
ax0: Using MSI-X interrupts with 16 vectors
ax1: Using MSI-X interrupts with 16 vectors
ax2: Using MSI-X interrupts with 6 vectors
ax3: Using MSI-X interrupts with 6 vectors

As far as I understand, RSS only helps with multiple flows. Is 1Gbit/s the best we can expect from this device for a single flow?