Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - jzah

#1
We use Chelsio T580-LP-CR cards (2x40Gbps QSFP ports) on two HP DL360 Gen9 servers (and we tested as well Intel X810) in HA mode. The servers have two physical CPUs (however one CPU is useless, as the card is bound to a CPU - NUMA affinity is the keywoard). We enabled RSS to use more than 1 core. At the moment we stopped testing as we have an issue with CARP which needs to be solved first before we can continue to optimize the performance, so the tunables below are not all 100% required.
pfSync is connected over separate intel X710 cards.

hw.ibrs_disable -> 1
if_cxgbe_load -> yes
kern.ipc.maxsockbuf -> 629145600
machdep.hyperthreading_intr_allowed -> 1
net.inet.rss.bits -> 8 (just for testing, at the end it should max out physical CPU including HT cores)
net.inet.rss.enabled -> 1
net.isr.bindthreads -> 1
net.isr.maxthreads -> -1
net.link.ifqmaxlen -> 4096
t5fw_cfg_load -> yes
vm.pmap.pti -> 0


The interface configuration (TSO, LRO,...) is in the attachment.
#2
We are experimenting with Chelsios as well - but we have XEON CPUs and not Atoms. This 6Gbps are in our case for single iperf streams - we don't know why this limit is at 6Gbps, it's more or less the same for intel X810 cards. If we enable RSS and switch to multiple iperf streams we are getting far more (>20Gbps). Cheers
#3
Just for other people who are looking for a solution. It seems that the following helps to prevent RSS using CPU0...

# show cpu assignment to thread netisr for CPU0 -> eg. 100241
procstat -a -S

# set CPU2 to process netisr from CPU0 (of course in our test we had multiple cores available and RSS only enabled on CPU0/CPU1. So CPU2 was just idleing.
cpuset -l 2 -t 100241


With this setting RSS doesn't use anymore CPU0 and hence no drops on CPU0 anymore for CARP.
#4
Hi guys

we try to build a HA cluster pair with two XEON based servers. We have Intel E710 and a X810 cards in each server. Our actual test was with the X810 card based on 8x10Gb links (two QSFP 40G optics, which are divided into 4x10G each). So we do LACP on 8x10Gbps links.
As clients wo do have 6 servers with 2x10Gbps (LACP bonded on RHEL9). Between two server in the same subnet we are able to get 20Gbps throughput with iperf3.
We have enabled RSS based on the documentation, which seems to work fine. We see, that based on the tunable net.inet.rss.bits, that eg. if we use "2" it uses multiple (4) CPUs. So far so good.

Our problem is, if we have high load on CPU0  (which is possible to achieve with one iperf3 stream from one client) we see that LACP and CARP doesn't work anymore on FreeBSD. OPNsense completely stuck related to CARP and LACP, which triggeres immediate failovers and flapping between the two firewalls.
We tried then to introduce the following iflib tunables: dev.ice.0.iflib.core_offset="1" (of course for all ice NICs). However it's still possible to see high load due to RSS on CPU0. So it doesn't work as expect.

How can we ensure with an Intel NIC that RSS doesn't use CPU0? Because it seems that the slow protocols like ARP/CARP/LACP are seems to be bound to CPU0....

On Chelsio cards there seems to be some tunables to fix that, eg. hw.cxgbe.rsrv_noflowq="1" for TX direction

We really stuck here, we experimented with cpuset as well, but didn't come to a working solution. Any help is really appreciated...

Cheers