10Gbit performance problems with Chelsio T520-SO-CR (solved)

Started by JamesFrisch, June 19, 2023, 01:04:18 PM

Previous topic - Next topic
June 26, 2023, 08:32:12 PM #15 Last Edit: June 27, 2023, 08:11:58 AM by pmhausen
Kristof Provost, one of the main current network developers for FreeBSD argued that iperf is not suitable to measure packet forwarding performance because at 10G and above you are more likely to max out ipferf itself.

He recommends pkt-gen/netmap or DPDK.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

We are experimenting with Chelsios as well - but we have XEON CPUs and not Atoms. This 6Gbps are in our case for single iperf streams - we don't know why this limit is at 6Gbps, it's more or less the same for intel X810 cards. If we enable RSS and switch to multiple iperf streams we are getting far more (>20Gbps). Cheers

Quote from: jzah on June 27, 2023, 08:01:46 AM
We are experimenting with Chelsios as well - but we have XEON CPUs and not Atoms. This 6Gbps are in our case for single iperf streams - we don't know why this limit is at 6Gbps, it's more or less the same for intel X810 cards. If we enable RSS and switch to multiple iperf streams we are getting far more (>20Gbps). Cheers

That is interesting - are you able to share more details? Hardware specs, setup, config etc. I never got anywhere near those numbers despite trial-and-error my way through various undocumented snags (mostly documented here https://forum.opnsense.org/index.php?topic=25263). RSS at the time wasn't mature, but the load and interrupts seemingly looked well balanced across cores anyway, nevertheless never got close to 10Gb line rate even with multiple streams.



We use Chelsio T580-LP-CR cards (2x40Gbps QSFP ports) on two HP DL360 Gen9 servers (and we tested as well Intel X810) in HA mode. The servers have two physical CPUs (however one CPU is useless, as the card is bound to a CPU - NUMA affinity is the keywoard). We enabled RSS to use more than 1 core. At the moment we stopped testing as we have an issue with CARP which needs to be solved first before we can continue to optimize the performance, so the tunables below are not all 100% required.
pfSync is connected over separate intel X710 cards.

hw.ibrs_disable -> 1
if_cxgbe_load -> yes
kern.ipc.maxsockbuf -> 629145600
machdep.hyperthreading_intr_allowed -> 1
net.inet.rss.bits -> 8 (just for testing, at the end it should max out physical CPU including HT cores)
net.inet.rss.enabled -> 1
net.isr.bindthreads -> 1
net.isr.maxthreads -> -1
net.link.ifqmaxlen -> 4096
t5fw_cfg_load -> yes
vm.pmap.pti -> 0


The interface configuration (TSO, LRO,...) is in the attachment.

June 27, 2023, 05:44:46 PM #19 Last Edit: June 27, 2023, 05:48:51 PM by JamesFrisch
Quote from: 134 on June 26, 2023, 08:19:57 PM

Is that 6Gbps result done with single stream or multiple stream of iperf3?

I doubt the NIC is bottleneck. You can try turning pf off, but it would mean that a no firewall or ACL on any interface.

Both. I can use the default 1 or use the switch -P and set it to 10.

I also tried to curl a file from my ISP (the offer a 50GB file full of zeros to speedtest http. I get around 4GBit, but that is probably bottlenecked by my SSD.


iperf3 -c speedtest.init7.net -P 32 -t 30 = 6.19Gbit/s
iperf3 -c speedtest.init7.net -t 30 = 5.35Gbit/s
iperf3 -c speedtest.init7.net -P 32 -t 30 -R = 9.41Gbit/s
iperf3 -c speedtest.init7.net -t 30 -R = 4.38Gbit/s

The download traffic with the 32 parallel streams looks great!
I wonder why this is not the case for upload. Maybe some bottleneck on the disk?


After disabling the firewall, I also get 9.4Gbit/s for upload
iperf3 -c speedtest.init7.net -P 32 -T 30 = 9.41Gbit/s


Update: It gets even stranger  ;D
So after re-enabling the firewall and setting the speedtest to -t 90, I can observe a funny behaviour.
For a brief period, it stays at 6,2Gbit/s and 25 CPU usage. If I cancel the speedtest and immediately restart it, the speed jumps up to 9.42Gbit/s again and CPU usage is aroung 90%.
Maybe something with PowerD set to Adaptive?

Update2: Yep, setting PowerD to minimum drops performance to 6Gbit/s, while Hiadaptive or maximum gets 9.25Gbit/s. This solves my problem. Thank you for your help guys! Please don't derail conversations to Linux vs. FreeBSD, it does not really help  :-*






Quote from: JamesFrisch on June 27, 2023, 05:44:46 PM
Update2: Yep, setting PowerD to minimum drops performance to 6Gbit/s, while Hiadaptive or maximum gets 9.25Gbit/s. This solves my problem. Thank you for your help guys!

Huh.  Interesting.  What did you originally have it set to?

Originally it was set to "Adaptive".

When I started a speedtest, it was around 6Gbit.
When I started a speedtest, canceled it after a few seconds and immediately restarted the speedtest, I also got 9Gbit with the "Adaptive" mode.