OPNsense Forum

English Forums => Hardware and Performance => Topic started by: JamesFrisch on June 19, 2023, 01:04:18 pm

Title: 10Gbit performance problems with Chelsio T520-SO-CR (solved)
Post by: JamesFrisch on June 19, 2023, 01:04:18 pm
Because of the apparently good Chelsio FreeBSD driver support, I bought some Chelsio T520-SO-CR NICs.
Unfortunately they seem to max out at 6Gbit/s when using FreeBSD. This problem has come up multiple times in similar threads, but I was unable to find an answer.

https://forum.opnsense.org/index.php?topic=25263
https://forum.opnsense.org/index.php?topic=25844


Did anyone actually managed to run this card close to line speed and would not mind sharing the configs?
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: rungekutta on June 20, 2023, 09:42:20 pm
One of those threads were mine and no, afraid not. Moved to another solution for now.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: j_s on June 22, 2023, 04:28:05 am
@JamesFrisch

Can you provide specs on your opnsense system?  Is it virtualized?
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: JamesFrisch on June 22, 2023, 01:07:50 pm
One of those threads were mine and no, afraid not. Moved to another solution for now.

To bad, makes me worry that there is no solution to this problem.


Can you provide specs on your opnsense system?  Is it virtualized?

Both. I first tried it in a VM on a Proxmox host with an Xeon E-2276G. Linux Bridge, Q35, hardware offloading disabled. Only got 6Gbit. Thought this has to be some virtualization overhead. Now I run it bare-metal on a i3-8100 CPU @ 3.60GHz with 8GB RAM, but the problem is still there. Tried updating to the newest firmware, enable hardware offloading and so on. Nothing helped.

What I did to was loading the Chelsio driver with loader.conf and not with a tunable, because there are no other NICs.

After some digging, I also added the t4 and t6 line to the loader.conf (although not sure if needed, because it is a t5 card) and added the following lines to tunable:

hw.cxgbe.fcoecaps_allowed 0   
hw.cxgbe.iscsicaps_allowed 0   
hw.cxgbe.rdmacaps_allowed 0   
hw.cxgbe.toecaps_allowed 0

That did not help with performance but brought down the T5 temp by 10 degrees in the dashboard  :)

Performance with ipferf3 is 4.5GBit down and 6Gbit up.

Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: eneerge on June 24, 2023, 02:48:03 am
I had performance issues as well (https://forum.opnsense.org/index.php?topic=31680.0). I've since moved to a Linux based product. I experienced the same issue on pfSense as well.

My issue occurs only with 1gbit up/down. Initially, I get full 1gbit up/down. However, it will eventually go into "slow mode" and the download will be like 500-600mbit and the upload only 50mbit. I only get this when I use a pf based firewall. Updated all ethernet firmware as well. Now that I've swapped to something netfilter based, I have 0 hickups.

The best way I could reproduce the issue was to go to youtube and then click on several videos in quick succession for about 5-10 minutes. Eventually, it would just slow down. If I rebooted the system, speed would come back. If I disabled the interface and reenabled it, the speed would come back. Tried various tweaks/tunes to no avail.

What I'm running now doesn't have the features that OpnSense does. I miss it, but I can't deal with the major slow down.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: CJ on June 24, 2023, 05:05:17 pm
How are you testing your speed?  Single or multiple connections?

IIRC, when testing with iperf against TrueNAS I was able to get line speed using multiple connections.  I can't recall what I got when I was testing iperf against OPNSense.  I haven't looked into it too much because I don't have anything on the other side that could support that high a speed in order to do throughput testing.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: 134 on June 24, 2023, 09:12:12 pm
*sense isn't great at routing. However the odd thing i find is that my Atom C3558 and Intel X710 hit a wall at 5-6Gbps of iperf3 traffic while Netgate claim 6100 with same CPU running pfsense plus can push almost 10G. Does anybody know what '10k ACLs' mean on their specs?

At the end I gave up trying 10G inter-VLAN routing with FreeBSD firewall, mainly because i don't actually need 10G routing. I'm moving from Supermicro 1U appliance to a Chinese fanless mini PC with new Intel N100 SoC and 5x 2.5GbE port.

If you really need 10G routing, try VyOS. It's CLI only and require steep learning curve but it's linux based and very decent at routing.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: CJ on June 25, 2023, 02:53:36 pm
*sense isn't great at routing.

It's a FreeBSD thing rather than specific to *sense.  I forget the reasons why Linux performs better on the same hardware.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: rungekutta on June 26, 2023, 07:05:14 am
*sense isn't great at routing.

It's a FreeBSD thing rather than specific to *sense.  I forget the reasons why Linux performs better on the same hardware.

While this is probably true, like you also mentioned I also got much better results with TrueNas (on FreeBSD). So there seems to be something going on in additions which is specific to OpnSense. Haven’t tried pfsense.

I looked at VyOS but wasn’t for me. Only marginally more convenient than rolling your own nftables config file on top of a minimal Debian install, but comes with the downside of vendor lock-in and faff with getting the ISOs etc.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: lilsense on June 26, 2023, 10:07:09 am
*sense isn't great at routing.

It's a FreeBSD thing rather than specific to *sense.  I forget the reasons why Linux performs better on the same hardware.

This is incorrect in so many ways... Have you heard of Juniper routers or firewalls? They use FreeBSD. Juniper is not the only one there are many high performance network routers that use FreeBSD, Force10 and Extreme comes to mind.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: CJ on June 26, 2023, 01:01:01 pm
While this is probably true, like you also mentioned I also got much better results with TrueNas (on FreeBSD). So there seems to be something going on in additions which is specific to OpnSense. Haven’t tried pfsense.

TrueNAS doesn't do routing, though.

This is incorrect in so many ways... Have you heard of Juniper routers or firewalls? They use FreeBSD. Juniper is not the only one there are many high performance network routers that use FreeBSD, Force10 and Extreme comes to mind.

I'll admit that I haven't looked into it.  I just recall there being something different about how FreeBSD vs Linux handles things that causes the performance differences.  It's been a few years since I've seen discussions on it so perhaps the situation has changed.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: JamesFrisch on June 26, 2023, 02:15:47 pm
Guys, can we please stay on the topic? I don't care what Juniper or what VyOS does!

I am wondering, is 10Gbit achievable with OPNsense and Chelsio?
If yes, how?
If no, what NICs do?
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: lilsense on June 26, 2023, 02:26:45 pm
FreeBSD is far more stable than say a linux distro... Most edge testings of 100GigE+ are done on FreeBSD as it no longer has a software limitation as much as the hardware itself.

Linux Kernel does not have this issue which this cannot be said on various distros. :D

here you go:

https://netflixtechblog.com/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: 134 on June 26, 2023, 08:13:03 pm

This is incorrect in so many ways... Have you heard of Juniper routers or firewalls? They use FreeBSD. Juniper is not the only one there are many high performance network routers that use FreeBSD, Force10 and Extreme comes to mind.

Junos OS Evolved is now Linux-based, they are moving away. Force10's FTOS 10 is also Linux now under the name Dell Networking OS. Also these OSes run on networking gears equipped with ASIC or FPGA to boost performance, so we can't say anything by pointing at these companies.

FreeBSD is far more stable than say a linux distro... Most edge testings of 100GigE+ are done on FreeBSD as it no longer has a software limitation as much as the hardware itself.

Linux Kernel does not have this issue which this cannot be said on various distros. :D

here you go:

https://netflixtechblog.com/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99

Linux can also be very stable, depending on the kernel you choose. In that Netflix case study, FreeBSD was used as file server and not router or firewall. The problem presented in that post was not the networking stack itself but feeding the data to networking stack of FreeBSD.

In context of pure software routing/firewalling, this is the contest between Linux's iptables/nftables and FreeBSD's pf. It's no secret that nftables is not only faster but also scales better with number of cores:

https://matteocroce.medium.com/linux-and-freebsd-networking-cbadcdb15ddd

And then there's this new toy called eBPF which is used by Google, Cloudflare, Netflix, Alibaba .... for packet processing. The developments around Linux is just much more active and it's big reason for transition.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: 134 on June 26, 2023, 08:19:57 pm
Guys, can we please stay on the topic? I don't care what Juniper or what VyOS does!

I am wondering, is 10Gbit achievable with OPNsense and Chelsio?
If yes, how?
If no, what NICs do?

Is that 6Gbps result done with single stream or multiple stream of iperf3?

I doubt the NIC is bottleneck. You can try turning pf off, but it would mean that a no firewall or ACL on any interface.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: Patrick M. Hausen on June 26, 2023, 08:32:12 pm
Kristof Provost, one of the main current network developers for FreeBSD argued that iperf is not suitable to measure packet forwarding performance because at 10G and above you are more likely to max out ipferf itself.

He recommends pkt-gen/netmap or DPDK.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: jzah on June 27, 2023, 08:01:46 am
We are experimenting with Chelsios as well - but we have XEON CPUs and not Atoms. This 6Gbps are in our case for single iperf streams - we don't know why this limit is at 6Gbps, it's more or less the same for intel X810 cards. If we enable RSS and switch to multiple iperf streams we are getting far more (>20Gbps). Cheers
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: rungekutta on June 27, 2023, 10:25:26 am
We are experimenting with Chelsios as well - but we have XEON CPUs and not Atoms. This 6Gbps are in our case for single iperf streams - we don't know why this limit is at 6Gbps, it's more or less the same for intel X810 cards. If we enable RSS and switch to multiple iperf streams we are getting far more (>20Gbps). Cheers

That is interesting - are you able to share more details? Hardware specs, setup, config etc. I never got anywhere near those numbers despite trial-and-error my way through various undocumented snags (mostly documented here https://forum.opnsense.org/index.php?topic=25263). RSS at the time wasn’t mature, but the load and interrupts seemingly looked well balanced across cores anyway, nevertheless never got close to 10Gb line rate even with multiple streams.


Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: jzah on June 27, 2023, 01:44:00 pm
We use Chelsio T580-LP-CR cards (2x40Gbps QSFP ports) on two HP DL360 Gen9 servers (and we tested as well Intel X810) in HA mode. The servers have two physical CPUs (however one CPU is useless, as the card is bound to a CPU - NUMA affinity is the keywoard). We enabled RSS to use more than 1 core. At the moment we stopped testing as we have an issue with CARP which needs to be solved first before we can continue to optimize the performance, so the tunables below are not all 100% required.
pfSync is connected over separate intel X710 cards.

Code: [Select]
hw.ibrs_disable -> 1
if_cxgbe_load -> yes
kern.ipc.maxsockbuf -> 629145600
machdep.hyperthreading_intr_allowed -> 1
net.inet.rss.bits -> 8 (just for testing, at the end it should max out physical CPU including HT cores)
net.inet.rss.enabled -> 1
net.isr.bindthreads -> 1
net.isr.maxthreads -> -1
net.link.ifqmaxlen -> 4096
t5fw_cfg_load -> yes
vm.pmap.pti -> 0

The interface configuration (TSO, LRO,...) is in the attachment.
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: JamesFrisch on June 27, 2023, 05:44:46 pm

Is that 6Gbps result done with single stream or multiple stream of iperf3?

I doubt the NIC is bottleneck. You can try turning pf off, but it would mean that a no firewall or ACL on any interface.

Both. I can use the default 1 or use the switch -P and set it to 10.

I also tried to curl a file from my ISP (the offer a 50GB file full of zeros to speedtest http. I get around 4GBit, but that is probably bottlenecked by my SSD.


iperf3 -c speedtest.init7.net -P 32 -t 30 = 6.19Gbit/s
iperf3 -c speedtest.init7.net -t 30 = 5.35Gbit/s
iperf3 -c speedtest.init7.net -P 32 -t 30 -R = 9.41Gbit/s
iperf3 -c speedtest.init7.net -t 30 -R = 4.38Gbit/s

The download traffic with the 32 parallel streams looks great!
I wonder why this is not the case for upload. Maybe some bottleneck on the disk?


After disabling the firewall, I also get 9.4Gbit/s for upload
iperf3 -c speedtest.init7.net -P 32 -T 30 = 9.41Gbit/s


Update: It gets even stranger  ;D
So after re-enabling the firewall and setting the speedtest to -t 90, I can observe a funny behaviour.
For a brief period, it stays at 6,2Gbit/s and 25 CPU usage. If I cancel the speedtest and immediately restart it, the speed jumps up to 9.42Gbit/s again and CPU usage is aroung 90%.
Maybe something with PowerD set to Adaptive?

Update2: Yep, setting PowerD to minimum drops performance to 6Gbit/s, while Hiadaptive or maximum gets 9.25Gbit/s. This solves my problem. Thank you for your help guys! Please don't derail conversations to Linux vs. FreeBSD, it does not really help  :-*





Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: CJ on June 27, 2023, 07:24:54 pm
Update2: Yep, setting PowerD to minimum drops performance to 6Gbit/s, while Hiadaptive or maximum gets 9.25Gbit/s. This solves my problem. Thank you for your help guys!

Huh.  Interesting.  What did you originally have it set to?
Title: Re: 10Gbit performance problems with Chelsio T520-SO-CR
Post by: JamesFrisch on June 28, 2023, 06:57:01 am
Originally it was set to "Adaptive".

When I started a speedtest, it was around 6Gbit.
When I started a speedtest, canceled it after a few seconds and immediately restarted the speedtest, I also got 9Gbit with the "Adaptive" mode.