OPNsense Forum

English Forums => Hardware and Performance => Topic started by: rungekutta on October 22, 2021, 10:05:12 pm

Title: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on October 22, 2021, 10:05:12 pm
Hi,

Just upgraded WAN to 10Gb so trying to get OPNSense up to 10Gb too. The hardware is powerful but performance is poor... I need some help troubleshooting!

Hardware:
ASRock X470-D4U motherboard with Ryzen 3700x CPU (8c/16t, 3.6GHz up to 4.4GHz turbo)
32GB RAM
Intel i350-T4 quad gigabit NIC
Chelsio T420-CR dual 10Gb SFP+
MikroTik CRS328-24P-4S+RM switch

Software environment:
Proxmox 7.0
OPNSense running virtualised with the Intel i350-T4 and Chelsio T420-CR in PCIe passthrough to the VM

And before you say anything... yes I also suspect that it's the virtualisation that somehow causes my performance issues... but I want to be sure before I migrate to bare metal as the virtualisation provides many benefits including easy snapshot backups etc.

Description of symptoms:
Noticed that WAN speed were poor (approx 1.4Gb/s) with Suricata still only consuming approx 50% of total CPU. No improvement with Suricata disabled, and in that case with CPU mostly idle according to top.

So I moved on to test my internal network with iPerf3. I verified that my NAS (TrueNAS, Chelsio T420-CR) and another Proxmox node (Ryzen 5950x, Mellanox ConnectX-4 Lx) saturate 10Gb/s no problem via iPerf3. Both machines however, via the same switch and network cards, into the Chelsio T420-CR in OPNSense, hardly manage to break 1Gb/s:

Code: [Select]
Connecting to host 192.168.200.1, port 5201
[  5] local 192.168.200.10 port 22966 connected to 192.168.200.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   124 MBytes  1.04 Gbits/sec    0    757 KBytes       
[  5]   1.00-2.00   sec   128 MBytes  1.07 Gbits/sec    0   1.30 MBytes       
[  5]   2.00-3.00   sec   132 MBytes  1.11 Gbits/sec   30    810 KBytes       
[  5]   3.00-4.00   sec   124 MBytes  1.04 Gbits/sec    0    937 KBytes       
[  5]   4.00-5.00   sec   133 MBytes  1.12 Gbits/sec    0   1.04 MBytes       
[  5]   5.00-6.00   sec   128 MBytes  1.07 Gbits/sec    0   1.16 MBytes       
[  5]   6.00-7.00   sec   144 MBytes  1.21 Gbits/sec    0   1.28 MBytes       
[  5]   7.00-8.00   sec   138 MBytes  1.15 Gbits/sec    0   1.31 MBytes       
[  5]   8.00-9.00   sec   133 MBytes  1.11 Gbits/sec    0   1.31 MBytes       
[  5]   9.00-10.00  sec   123 MBytes  1.03 Gbits/sec    0   1.31 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.28 GBytes  1.10 Gbits/sec   30             sender
[  5]   0.00-10.01  sec  1.27 GBytes  1.09 Gbits/sec                  receiver

CPU is approx 65% idle during the test so hardly the bottleneck.

Here is the output of "ifconfig -v cxgbe1":

Code: [Select]
cxgbe1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=28c00b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,HWRXTSTMP>
ether 00:07:43:11:2b:18
inet6 fe80::207:43ff:fe11:2b18%cxgbe1 prefixlen 64 scopeid 0x2
inet 192.168.200.1 netmask 0xffffff00 broadcast 192.168.200.255
media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail)
vendor: FS PN: SFPP-PC015 SN: S2108004672-1 DATE: 2021-08-11

/boot/loader.conf.local contains:

Code: [Select]
t4fw_cfg_load="YES"
if_cxgbe_load="YES"

# Disabling cxgbe caps
hw.cxgbe.toecaps_allowed="0"
hw.cxgbe.rdmacaps_allowed="0"
hw.cxgbe.iscsicaps_allowed="0"
hw.cxgbe.fcoecaps_allowed="0"

I have no idea where to go from here. Any ideas on how to troubleshoot to find the bottleneck?
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on October 25, 2021, 08:39:57 am
Nobody..? I guess I was hoping some FreeBSD whiz would come along and tell me to run some obscure PCIe command line tools or some other profiling to help me find the problem…   ;)

I guess I could try to install opnsense bare metal on a usb stick, move over the config and try that way to see how it performs. If better then the problem obviously lies with the virtualization somehow. Otherwise some hardware problem as not even iperf gets decent performance.
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on October 28, 2021, 03:15:35 pm
I installed OpnSense on a usb stick and tested the same hardware on bare metal. Unfortunately more or less exactly the same results. WAN is limited to 1.4Gb/s with or without suricata and iperf3 between OpnSense and another 10Gb box averages around 1Gb/s over the 10Gb/s link.

I will try Linux on the same hardware next to see if it’s a hardware problem or the limitations are with OpnSense.
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: testo_cz on October 31, 2021, 08:20:27 am
I'm sure there are people using cxgbe based NICs. Have you search this forum ?

Paste here some boot messages output of the driver ( dmesg | grep cxgbe ) , so it can be double-checked for queues/buffers/netmap configurations.

When you switch network adapters of the VM from PCI passthrough to Linux bridge+Virtio for example, are you getting more throughput with iperf3 ?
(my smaller HW based Proxmox + 4 vCore OPNsense VM give up to 3Gbps from WAN PC - through OPN - to LAN PC, iperf3 -P2 -t60 ... )
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: blblblb on November 02, 2021, 10:04:06 pm
Might want to look at this:
https://forum.opnsense.org/index.php?topic=25410.msg122060#msg122060

I'm not yet sure what the culprit is. Could you use some of my commands with UDP/-u mode and -Z -N whenever possible?
Also -P n where n is half your cores*2 count.  (just to avoid competing for resources elsewhere, leave some cores "free"). You can use all of them, though, but I suggest trying -P 2 first.

TL;DR run iperf3 with -u mode, it will show you packet loss. It's also relevant.
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 04, 2021, 10:41:24 pm
Thank you blblblb and testo_cz. I'll get back to your suggestions shortly. Just to confirm first that the issue definitely is with OpnSense; I spun up a minimal Debian VM instead on the same host and ran the same test. Easily saturating the 10Gb link.

Log snippets from the other end of iPerf3, Debian VM for comparison:
Code: [Select]
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.216, port 40042
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.216 port 40044
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   1.00-2.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   2.00-3.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   3.00-4.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   4.00-5.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   5.00-6.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   6.00-7.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   7.00-8.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   8.00-9.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   9.00-10.00  sec  1.10 GBytes  9.41 Gbits/sec
[  5]  10.00-10.00  sec  1.64 MBytes  9.37 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  11.0 GBytes  9.41 Gbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.216, port 40046
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.216 port 40048
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.10 GBytes  9.44 Gbits/sec    0   3.00 MBytes
[  5]   1.00-2.00   sec  1.10 GBytes  9.41 Gbits/sec    0   3.00 MBytes
[  5]   2.00-3.00   sec  1.08 GBytes  9.31 Gbits/sec  1527   1.83 MBytes
[  5]   3.00-4.00   sec  1.09 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   4.00-5.00   sec  1.10 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   5.00-6.00   sec  1.09 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   6.00-7.00   sec  1.09 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   7.00-8.00   sec  1.09 GBytes  9.38 Gbits/sec  920   2.26 MBytes
[  5]   8.00-9.00   sec  1.09 GBytes  9.36 Gbits/sec  921   2.31 MBytes
[  5]   9.00-10.00  sec  1.10 GBytes  9.41 Gbits/sec    0   2.32 MBytes
[  5]  10.00-10.00  sec   402 KBytes  8.89 Gbits/sec    0   2.32 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.39 Gbits/sec  3368             sender
-----------------------------------------------------------

Same setup and hardware, OpnSense instead of Debian:
Code: [Select]
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.1, port 19291
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.1 port 27695
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   160 MBytes  1.34 Gbits/sec
[  5]   1.00-2.00   sec   169 MBytes  1.42 Gbits/sec
[  5]   2.00-3.00   sec   168 MBytes  1.41 Gbits/sec
[  5]   3.00-4.00   sec   172 MBytes  1.44 Gbits/sec
[  5]   4.00-5.00   sec   179 MBytes  1.50 Gbits/sec
[  5]   5.00-6.00   sec   178 MBytes  1.49 Gbits/sec
[  5]   6.00-7.00   sec   187 MBytes  1.57 Gbits/sec
[  5]   7.00-8.00   sec   197 MBytes  1.65 Gbits/sec
[  5]   8.00-9.00   sec   168 MBytes  1.41 Gbits/sec
[  5]   9.00-10.00  sec   166 MBytes  1.39 Gbits/sec
[  5]  10.00-10.04  sec  8.47 MBytes  1.72 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.71 GBytes  1.46 Gbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.1, port 53951
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.1 port 65343
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   136 MBytes  1.14 Gbits/sec    0   1.03 MBytes
[  5]   1.00-2.00   sec   136 MBytes  1.14 Gbits/sec   23    766 KBytes
[  5]   2.00-3.00   sec   134 MBytes  1.12 Gbits/sec    0    892 KBytes
[  5]   3.00-4.00   sec   130 MBytes  1.09 Gbits/sec    0   1018 KBytes
[  5]   4.00-5.00   sec   129 MBytes  1.09 Gbits/sec    0   1.12 MBytes
[  5]   5.00-6.00   sec   135 MBytes  1.13 Gbits/sec    0   1.24 MBytes
[  5]   6.00-7.00   sec   132 MBytes  1.11 Gbits/sec    0   1.36 MBytes
[  5]   7.00-8.00   sec   132 MBytes  1.11 Gbits/sec    8   1.08 MBytes
[  5]   8.00-9.00   sec   144 MBytes  1.21 Gbits/sec    0   1.19 MBytes
[  5]   9.00-10.00  sec   132 MBytes  1.11 Gbits/sec    0   1.27 MBytes
[  5]  10.00-10.00  sec   126 KBytes  1.19 Gbits/sec    0   1.27 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.31 GBytes  1.12 Gbits/sec   31             sender
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 04, 2021, 10:46:37 pm
I'm sure there are people using cxgbe based NICs. Have you search this forum ?
Yes and without much coming up.  But yes, I understand Chelsio are very popular both with TrueNAS and pfSense due to strong support in FreeBSD, so I would expect the cards to work very well. Indeed I use the same card (Chelsio T420-CR) in TrueNAS and it works very well and easily saturates 10Gb against other hosts (not OpnSense!).

Paste here some boot messages output of the driver ( dmesg | grep cxgbe ) , so it can be double-checked for queues/buffers/netmap configurations.

Code: [Select]
root@XXX:~ # dmesg | grep cxgbe
cxgbe0: <port 0> on t4nex0
cxgbe0: Ethernet address: 00:07:43:11:2b:10
cxgbe0: 8 txq, 8 rxq (NIC)
cxgbe1: <port 1> on t4nex0
cxgbe1: Ethernet address: 00:07:43:11:2b:18
cxgbe1: 8 txq, 8 rxq (NIC)
cxgbe1: tso4 disabled due to -txcsum.
cxgbe1: tso6 disabled due to -txcsum6.
cxgbe0: tso4 disabled due to -txcsum.
cxgbe0: tso6 disabled due to -txcsum6.
cxgbe1: link state changed to UP
cxgbe0: link state changed to UP
556.329305 [1130] generic_netmap_attach     Emulated adapter for cxgbe1 created (prev was NULL)
556.329322 [1035] generic_netmap_dtor       Emulated netmap adapter for cxgbe1 destroyed
556.331828 [1130] generic_netmap_attach     Emulated adapter for cxgbe1 created (prev was NULL)
556.373055 [ 320] generic_netmap_register   Emulated adapter for cxgbe1 activated
556.381334 [1130] generic_netmap_attach     Emulated adapter for cxgbe0 created (prev was NULL)
556.381356 [1035] generic_netmap_dtor       Emulated netmap adapter for cxgbe0 destroyed
556.384000 [1130] generic_netmap_attach     Emulated adapter for cxgbe0 created (prev was NULL)
556.384271 [ 320] generic_netmap_register   Emulated adapter for cxgbe0 activated

When you switch network adapters of the VM from PCI passthrough to Linux bridge+Virtio for example, are you getting more throughput with iperf3 ?
(my smaller HW based Proxmox + 4 vCore OPNsense VM give up to 3Gbps from WAN PC - through OPN - to LAN PC, iperf3 -P2 -t60 ... )
Haven't tried virtual NICs. If there's one thing that people seems to have problem with, it's that... Recommendation seems to be to use passthrough if possible.
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 04, 2021, 10:52:04 pm
Might want to look at this:
https://forum.opnsense.org/index.php?topic=25410.msg122060#msg122060

I'm not yet sure what the culprit is. Could you use some of my commands with UDP/-u mode and -Z -N whenever possible?
Also -P n where n is half your cores*2 count.  (just to avoid competing for resources elsewhere, leave some cores "free"). You can use all of them, though, but I suggest trying -P 2 first.

TL;DR run iperf3 with -u mode, it will show you packet loss. It's also relevant.

See below. The CPU is Ryzen 3700x so definitely shouldn't be the bottleneck. Under Linux, it still almost idles whiles pushing through 10Gb in iPerf3.

Code: [Select]
root@XXXX:~ # iperf3 -c 192.168.200.10 -u -b 0 -N -Z -P4
Connecting to host 192.168.200.10, port 5201
[  5] local 192.168.200.1 port 35346 connected to 192.168.200.10 port 5201
[  7] local 192.168.200.1 port 35715 connected to 192.168.200.10 port 5201
[  9] local 192.168.200.1 port 50565 connected to 192.168.200.10 port 5201
[ 11] local 192.168.200.1 port 42027 connected to 192.168.200.10 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[  7]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[  9]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[ 11]   0.00-1.00   sec  34.3 MBytes   288 Mbits/sec  24660
[SUM]   0.00-1.00   sec   137 MBytes  1.15 Gbits/sec  98640
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[  7]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[  9]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[ 11]   1.00-2.00   sec  33.2 MBytes   278 Mbits/sec  23820
[SUM]   1.00-2.00   sec   133 MBytes  1.11 Gbits/sec  95280
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[  7]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[  9]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[ 11]   2.00-3.00   sec  34.5 MBytes   290 Mbits/sec  24800
[SUM]   2.00-3.00   sec   138 MBytes  1.16 Gbits/sec  99200
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[  7]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[  9]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[ 11]   3.00-4.00   sec  33.0 MBytes   277 Mbits/sec  23700
[SUM]   3.00-4.00   sec   132 MBytes  1.11 Gbits/sec  94800
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[  7]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[  9]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[ 11]   4.00-5.00   sec  35.5 MBytes   298 Mbits/sec  25500
[SUM]   4.00-5.00   sec   142 MBytes  1.19 Gbits/sec  102000
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[  7]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[  9]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[ 11]   5.00-6.00   sec  35.6 MBytes   298 Mbits/sec  25540
[SUM]   5.00-6.00   sec   142 MBytes  1.19 Gbits/sec  102160
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[  7]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[  9]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[ 11]   6.00-7.00   sec  34.9 MBytes   293 Mbits/sec  25100
[SUM]   6.00-7.00   sec   140 MBytes  1.17 Gbits/sec  100400
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[  7]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[  9]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[ 11]   7.00-8.00   sec  34.7 MBytes   291 Mbits/sec  24890
[SUM]   7.00-8.00   sec   139 MBytes  1.16 Gbits/sec  99560
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[  7]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[  9]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[ 11]   8.00-9.00   sec  34.2 MBytes   287 Mbits/sec  24580
[SUM]   8.00-9.00   sec   137 MBytes  1.15 Gbits/sec  98320
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[  7]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[  9]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[ 11]   9.00-10.00  sec  34.8 MBytes   292 Mbits/sec  24980
[SUM]   9.00-10.00  sec   139 MBytes  1.17 Gbits/sec  99920
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[  5]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.007 ms  40/247570 (0.016%)  receiver
[  7]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[  7]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.009 ms  39/247570 (0.016%)  receiver
[  9]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[  9]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.006 ms  38/247570 (0.015%)  receiver
[ 11]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.000 ms  0/247570 (0%)  sender
[ 11]   0.00-10.00  sec   345 MBytes   289 Mbits/sec  0.012 ms  40/247570 (0.016%)  receiver
[SUM]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec  0.000 ms  0/990280 (0%)  sender
[SUM]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec  0.008 ms  157/990280 (0.016%)  receiver

Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: testo_cz on November 07, 2021, 10:16:14 pm
Thank you blblblb and testo_cz. I'll get back to your suggestions shortly. Just to confirm first that the issue definitely is with OpnSense; I spun up a minimal Debian VM instead on the same host and ran the same test. Easily saturating the 10Gb link.

Log snippets from the other end of iPerf3, Debian VM for comparison:
Code: [Select]
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.216, port 40042
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.216 port 40044
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   1.00-2.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   2.00-3.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   3.00-4.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   4.00-5.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   5.00-6.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   6.00-7.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   7.00-8.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   8.00-9.00   sec  1.10 GBytes  9.41 Gbits/sec
[  5]   9.00-10.00  sec  1.10 GBytes  9.41 Gbits/sec
[  5]  10.00-10.00  sec  1.64 MBytes  9.37 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  11.0 GBytes  9.41 Gbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.216, port 40046
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.216 port 40048
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.10 GBytes  9.44 Gbits/sec    0   3.00 MBytes
[  5]   1.00-2.00   sec  1.10 GBytes  9.41 Gbits/sec    0   3.00 MBytes
[  5]   2.00-3.00   sec  1.08 GBytes  9.31 Gbits/sec  1527   1.83 MBytes
[  5]   3.00-4.00   sec  1.09 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   4.00-5.00   sec  1.10 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   5.00-6.00   sec  1.09 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   6.00-7.00   sec  1.09 GBytes  9.41 Gbits/sec    0   2.20 MBytes
[  5]   7.00-8.00   sec  1.09 GBytes  9.38 Gbits/sec  920   2.26 MBytes
[  5]   8.00-9.00   sec  1.09 GBytes  9.36 Gbits/sec  921   2.31 MBytes
[  5]   9.00-10.00  sec  1.10 GBytes  9.41 Gbits/sec    0   2.32 MBytes
[  5]  10.00-10.00  sec   402 KBytes  8.89 Gbits/sec    0   2.32 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.39 Gbits/sec  3368             sender
-----------------------------------------------------------

Same setup and hardware, OpnSense instead of Debian:
Code: [Select]
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.1, port 19291
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.1 port 27695
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   160 MBytes  1.34 Gbits/sec
[  5]   1.00-2.00   sec   169 MBytes  1.42 Gbits/sec
[  5]   2.00-3.00   sec   168 MBytes  1.41 Gbits/sec
[  5]   3.00-4.00   sec   172 MBytes  1.44 Gbits/sec
[  5]   4.00-5.00   sec   179 MBytes  1.50 Gbits/sec
[  5]   5.00-6.00   sec   178 MBytes  1.49 Gbits/sec
[  5]   6.00-7.00   sec   187 MBytes  1.57 Gbits/sec
[  5]   7.00-8.00   sec   197 MBytes  1.65 Gbits/sec
[  5]   8.00-9.00   sec   168 MBytes  1.41 Gbits/sec
[  5]   9.00-10.00  sec   166 MBytes  1.39 Gbits/sec
[  5]  10.00-10.04  sec  8.47 MBytes  1.72 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.71 GBytes  1.46 Gbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.200.1, port 53951
[  5] local 192.168.200.10 port 5201 connected to 192.168.200.1 port 65343
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   136 MBytes  1.14 Gbits/sec    0   1.03 MBytes
[  5]   1.00-2.00   sec   136 MBytes  1.14 Gbits/sec   23    766 KBytes
[  5]   2.00-3.00   sec   134 MBytes  1.12 Gbits/sec    0    892 KBytes
[  5]   3.00-4.00   sec   130 MBytes  1.09 Gbits/sec    0   1018 KBytes
[  5]   4.00-5.00   sec   129 MBytes  1.09 Gbits/sec    0   1.12 MBytes
[  5]   5.00-6.00   sec   135 MBytes  1.13 Gbits/sec    0   1.24 MBytes
[  5]   6.00-7.00   sec   132 MBytes  1.11 Gbits/sec    0   1.36 MBytes
[  5]   7.00-8.00   sec   132 MBytes  1.11 Gbits/sec    8   1.08 MBytes
[  5]   8.00-9.00   sec   144 MBytes  1.21 Gbits/sec    0   1.19 MBytes
[  5]   9.00-10.00  sec   132 MBytes  1.11 Gbits/sec    0   1.27 MBytes
[  5]  10.00-10.00  sec   126 KBytes  1.19 Gbits/sec    0   1.27 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.31 GBytes  1.12 Gbits/sec   31             sender

Comparisons with just an another system e.g.: Linux is misleading. It should be at least with all the HW offloading disabled, as posted here:

https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576
 (https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576)
But still this does not answer your Chelsio NIC performance.
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: testo_cz on November 07, 2021, 10:43:25 pm
I'm sure there are people using cxgbe based NICs. Have you search this forum ?
Yes and without much coming up.  But yes, I understand Chelsio are very popular both with TrueNAS and pfSense due to strong support in FreeBSD, so I would expect the cards to work very well. Indeed I use the same card (Chelsio T420-CR) in TrueNAS and it works very well and easily saturates 10Gb against other hosts (not OpnSense!).

Paste here some boot messages output of the driver ( dmesg | grep cxgbe ) , so it can be double-checked for queues/buffers/netmap configurations.

Code: [Select]
root@XXX:~ # dmesg | grep cxgbe
cxgbe0: <port 0> on t4nex0
cxgbe0: Ethernet address: 00:07:43:11:2b:10
cxgbe0: 8 txq, 8 rxq (NIC)
cxgbe1: <port 1> on t4nex0
cxgbe1: Ethernet address: 00:07:43:11:2b:18
cxgbe1: 8 txq, 8 rxq (NIC)
cxgbe1: tso4 disabled due to -txcsum.
cxgbe1: tso6 disabled due to -txcsum6.
cxgbe0: tso4 disabled due to -txcsum.
cxgbe0: tso6 disabled due to -txcsum6.
cxgbe1: link state changed to UP
cxgbe0: link state changed to UP
556.329305 [1130] generic_netmap_attach     Emulated adapter for cxgbe1 created (prev was NULL)
556.329322 [1035] generic_netmap_dtor       Emulated netmap adapter for cxgbe1 destroyed
556.331828 [1130] generic_netmap_attach     Emulated adapter for cxgbe1 created (prev was NULL)
556.373055 [ 320] generic_netmap_register   Emulated adapter for cxgbe1 activated
556.381334 [1130] generic_netmap_attach     Emulated adapter for cxgbe0 created (prev was NULL)
556.381356 [1035] generic_netmap_dtor       Emulated netmap adapter for cxgbe0 destroyed
556.384000 [1130] generic_netmap_attach     Emulated adapter for cxgbe0 created (prev was NULL)
556.384271 [ 320] generic_netmap_register   Emulated adapter for cxgbe0 activated

When you switch network adapters of the VM from PCI passthrough to Linux bridge+Virtio for example, are you getting more throughput with iperf3 ?
(my smaller HW based Proxmox + 4 vCore OPNsense VM give up to 3Gbps from WAN PC - through OPN - to LAN PC, iperf3 -P2 -t60 ... )
Haven't tried virtual NICs. If there's one thing that people seems to have problem with, it's that... Recommendation seems to be to use passthrough if possible.

Well, I can't compare precisely because I don't have such NIC.
I'd say , this looks good w.r.t. driver queues:
Code: [Select]
cxgbe1: 8 txq, 8 rxq (NIC)and this w.r.t netmap doesnt look much convincing:
Code: [Select]
556.373055 [ 320] generic_netmap_register   Emulated adapter for cxgbe1 activated
What for example other NICs produce w.r.t netmap is a nice report line:
Code: [Select]
igb0: netmap queues/slots: TX 2/1024, RX 2/1024

Maybe its worth to double-check if cxbe-netmap works in your system well.

What I do for HW NICs is to disable Flow-control by tunable, e.g. :
Code: [Select]
dev.igb.0.fc=0 , so maybe this: hw.cxgbe.pause_settings=0 in your case. and then I let networking service routines to spawn on CPU cores with tunables:
Code: [Select]
net.isr.bindthreads="-1"
net.isr.maxthreads="-1"

Some NICs work actually great if MSI blacklist is disabled
Code: [Select]
hw.pci.honor_msi_blacklist=0
An idea: Your Ryzen-based motherboard might not be fully supported by FreeBSD12.1 kernel/drivers set which are the base for OPNsense 21.x. IMHO poor performance could be the result.
Have you double-checked this ?

BTW: I see you're using /boot/loader.conf.local . I thought its not taken into account anymore in OPNsense, being told we should use System Settings->Tunables in the GUI.
Does
Code: [Select]
sysctl -a | grep 'hw.cxgbe' confirms your settings are being applied ?





Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 08, 2021, 11:54:35 am
and this w.r.t netmap doesnt look much convincing:
Code: [Select]
556.373055 [ 320] generic_netmap_register   Emulated adapter for cxgbe1 activated
Thank you - maybe this is the smoking gun I've been looking for? I've been Googling a bit on the topic and it's surprising how hard it is to find good information. The Netmap documentation (https://github.com/luigirizzo/netmap/blob/master/README.md) claims native support for Intel, Realtek and Nvidia. And adds a sentence "FreeBSD has also native netmap support in the Chelsio 10/40G cards.". But doesn't mention which cards. Meanwhile, Chelsio har published a whitepaper (https://www.chelsio.com/wp-content/uploads/resources/FreeBSD-T5-Netmap.pdf) where they brag about Netmap performance, adding "Chelsio recently released its support for T5-based adapters into the FreeBSD kernel.". So I've got a T4 (10Gb) adapter... maybe it's not natively supported then? A bit unclear.

Also not clear now much of a performance hit I should expect from an emulated adapter as opposed to native support in the driver.

In any case I'll probably try to find a T520 on eBay.

An idea: Your Ryzen-based motherboard might not be fully supported by FreeBSD12.1 kernel/drivers set which are the base for OPNsense 21.x. IMHO poor performance could be the result.
Have you double-checked this ?

Where/how would I check this...?

BTW: I see you're using /boot/loader.conf.local . I thought its not taken into account anymore in OPNsense, being told we should use System Settings->Tunables in the GUI.
Does
Code: [Select]
sysctl -a | grep 'hw.cxgbe' confirms your settings are being applied ?

Yes, that is working, and needs to be there for things to work. Otherwise the driver isn't loaded at the right time, and the card doesn't even get recognised during boot.

Thanks for the other suggestions on tunables, I'll look into them as well. Although I've been fiddling around with quite a few including "net.inet.ip.random_id" and others but only seen very marginal differences, not the factor 5 or 10 that I'm looking for here...
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: testo_cz on November 09, 2021, 09:08:23 am
Nice research about your NIC. I've opened 'man 4 netmap' and its right there in the SUPPORTED DEVICES section: Emulated mode is performance inferior to the native mode. It seems to me this is what you have just experienced.
https://www.freebsd.org/cgi/man.cgi?query=netmap&apropos=0&sektion=4&manpath=FreeBSD+12.1-RELEASE&arch=default&format=html (https://www.freebsd.org/cgi/man.cgi?query=netmap&apropos=0&sektion=4&manpath=FreeBSD+12.1-RELEASE&arch=default&format=html)

IMHO, your  dmesg output confirms that netmap runs in the Emulated mode for your cxgbe NIC.

I've found these network tuning guide helpful. Although my experience is that: One doesn't need much of tuning for a nice NIC to get a nice performance and no tuning really helps for bad NICs (hopefully not your case).
https://docs.netgate.com/pfsense/en/latest/hardware/tune.html (https://docs.netgate.com/pfsense/en/latest/hardware/tune.html)
https://calomel.org/freebsd_network_tuning.html (https://calomel.org/freebsd_network_tuning.html)

I'd say, finding HW support is tedious. I'm not an expert. People say to go to:
https://www.freebsd.org/releases/12.1R/hardware/ (https://www.freebsd.org/releases/12.1R/hardware/)
for the start, then to FreeBSD forum. Maybe here https://wiki.freebsd.org (https://wiki.freebsd.org) and there https://bsd-hardware.info (https://bsd-hardware.info)




Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 09, 2021, 12:31:30 pm
Thanks for the info. NB I’ve ordered a Chelsio T520-SO-CR on eBay. These are the (10Gb) cards that Netgate themselves sell for pfSense. And the FreeBSD crowd seems to love them. So can’t really get more “supported” than that.

Then there’s the question of CPU and chipset of course. The FreeBSD hardware list that you linked frankly leaves a lot to be desired. The latest “supported” AMD CPU on that list is from 2003. Let’s hope I shouldn’t read that literally…

Also, with regards to FreeBSD performance overall, I saw this: https://www.phoronix.com/scan.php?page=article&item=freebsd-13-beta1&num=1

… which also surprised me a bit. FreeBSD 13 is apparently now “closer to parity with Linux performance on the same hardware“ (note “closer”) and if you look at the results, 13 in turn is sometimes twice as fast, sometimes much more than that, compared to FreeBSD 12.

So in summary, netmap runs in emulated mode possibly due to lack of NIC support in drivers, already starting from possible question marks on FreeBSD 12 performance overall, possibly exacerbated further by HardenedBSD, and to round it off with question marks around how FreeBSD interacts with modern AMD CPUs…

I’ll report back when I have been able to try the T520.
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: testo_cz on November 14, 2021, 09:52:58 am
Thanks for the info. NB I’ve ordered a Chelsio T520-SO-CR on eBay. These are the (10Gb) cards that Netgate themselves sell for pfSense. And the FreeBSD crowd seems to love them. So can’t really get more “supported” than that.

Then there’s the question of CPU and chipset of course. The FreeBSD hardware list that you linked frankly leaves a lot to be desired. The latest “supported” AMD CPU on that list is from 2003. Let’s hope I shouldn’t read that literally…

Also, with regards to FreeBSD performance overall, I saw this: https://www.phoronix.com/scan.php?page=article&item=freebsd-13-beta1&num=1

… which also surprised me a bit. FreeBSD 13 is apparently now “closer to parity with Linux performance on the same hardware“ (note “closer”) and if you look at the results, 13 in turn is sometimes twice as fast, sometimes much more than that, compared to FreeBSD 12.

So in summary, netmap runs in emulated mode possibly due to lack of NIC support in drivers, already starting from possible question marks on FreeBSD 12 performance overall, possibly exacerbated further by HardenedBSD, and to round it off with question marks around how FreeBSD interacts with modern AMD CPUs…

I’ll report back when I have been able to try the T520.

I might have scared you with the Ryzen platform HW support too much.
It seems you have everything running stable even with FreeBSD12.x base, so it seems to me alright and the most important property afterall.

I'm generally in doubts about HW, namely because vendors implement vast amounts of "specifics" with which operating systems have to keep up.

AFAIK, FreeBSD 13 based OPNsense will be a major step forward w.r.t. both compatibility and performance of some HW setups.

Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 16, 2021, 01:12:47 am
Ok, I'm back with some results.

It took a while to get the card up and running. Unlike FreeBSD, OPNsense does not include firmware for this card (Chelsio T520-SO-CR) (why?!). Fortunately Proxmox updated the card firmware for me automatically (although with some alarming error messages - seems to have gone ok though...). Next problem was that passthrough didn't work, the VM never got past SeaBios initialisation. I managed to resolve that from FreeDOS and flashing the boot ROM on the card with tools downloaded from Chelsio.

Once I got the card up and running in OpnSense, netmap unfortunately still ran in emulated mode, and with the same underwhelming results as before.

More Googling showed that netmap only works with virtual functions on this card. So I had to add

Code: [Select]
hw.cxgbe.num_vis=2
in /boot/loader.conf.local. Then after boot, I have vcxl0 as well as cxl0 mapped to the same physical port on the card. BUT vcxl0 looks more promising:

Code: [Select]
root@xxx:~ # dmesg | grep vcxl
vcxl0: <port 0 vi 1> on cxl0
vcxl0: Ethernet address: 00:07:43:36:a3:a1
vcxl0: netmap queues/slots: TX 2/1023, RX 2/1024
vcxl0: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)
vcxl1: <port 1 vi 1> on cxl1
vcxl1: Ethernet address: 00:07:43:36:a3:a9
vcxl1: netmap queues/slots: TX 2/1023, RX 2/1024
vcxl1: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)
vcxl1: link state changed to UP
vcxl0: link state changed to UP
vcxl1: tso4 disabled due to -txcsum.
vcxl1: tso6 disabled due to -txcsum6.
vcxl1: nrxq (1) != kernel RSS buckets (8);performance will be impacted.

So I reassigned LAN and WAN to the virtual functions instead and re-ran iperf3. Better!

Code: [Select]
root@xxx:~ # iperf3 -c 192.168.200.1 -R
Connecting to host 192.168.200.1, port 5201
Reverse mode, remote host 192.168.200.1 is sending
[  5] local 192.168.200.10 port 58912 connected to 192.168.200.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   819 MBytes  6.87 Gbits/sec
[  5]   1.00-2.00   sec   843 MBytes  7.07 Gbits/sec
[  5]   2.00-3.00   sec   830 MBytes  6.96 Gbits/sec
[  5]   3.00-4.00   sec   827 MBytes  6.94 Gbits/sec
[  5]   4.00-5.00   sec   835 MBytes  7.00 Gbits/sec
[  5]   5.00-6.00   sec   856 MBytes  7.18 Gbits/sec
[  5]   6.00-7.00   sec   831 MBytes  6.97 Gbits/sec
[  5]   7.00-8.00   sec   870 MBytes  7.30 Gbits/sec
[  5]   8.00-9.00   sec   823 MBytes  6.90 Gbits/sec
[  5]   9.00-10.00  sec   825 MBytes  6.92 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.17  sec  8.17 GBytes  6.89 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  8.16 GBytes  7.01 Gbits/sec                  receiver

iperf Done.
root@xxx:~ # iperf3 -c 192.168.200.1
Connecting to host 192.168.200.1, port 5201
[  5] local 192.168.200.10 port 62693 connected to 192.168.200.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   662 MBytes  5.55 Gbits/sec    0   1.03 MBytes
[  5]   1.00-2.00   sec   657 MBytes  5.51 Gbits/sec    5   1021 KBytes
[  5]   2.00-3.00   sec   660 MBytes  5.53 Gbits/sec    0   1.09 MBytes
[  5]   3.00-4.00   sec   661 MBytes  5.55 Gbits/sec    0   1.09 MBytes
[  5]   4.00-5.00   sec   654 MBytes  5.48 Gbits/sec    0   1.20 MBytes
[  5]   5.00-6.00   sec   657 MBytes  5.51 Gbits/sec    0   1.32 MBytes
[  5]   6.00-7.00   sec   656 MBytes  5.50 Gbits/sec    0   1.32 MBytes
[  5]   7.00-8.00   sec   653 MBytes  5.48 Gbits/sec    0   1.32 MBytes
[  5]   8.00-9.00   sec   658 MBytes  5.52 Gbits/sec    0   1.32 MBytes
[  5]   9.00-10.00  sec   653 MBytes  5.48 Gbits/sec    0   1.32 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.42 GBytes  5.51 Gbits/sec    5             sender
[  5]   0.00-10.00  sec  6.42 GBytes  5.51 Gbits/sec                  receiver

iperf Done.

So about 5-7 times faster, and getting closer to line speed now... Not quite there yet, but big improvement.

And this took a lot of trial and error!
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 16, 2021, 01:57:57 am
To add 2 more points.

First, adding

Code: [Select]
hw.cxgbe.nrxq_vi=8
hw.cxgbe.ntxq_vi=8
hw.cxgbe.nnmtxq_vi=8
hw.cxgbe.nnmrxq_vi=8

creates 8 rx/tx queues also for these virtual ports:

Code: [Select]
vcxl0: netmap queues/slots: TX 8/1023, RX 8/1024
vcxl0: 8 txq, 8 rxq (NIC); 8 txq, 8 rxq (netmap)
vcxl1: netmap queues/slots: TX 8/1023, RX 8/1024
vcxl1: 8 txq, 8 rxq (NIC); 8 txq, 8 rxq (netmap)

Second, unfortunately, Suricata now destroys performance, even without any rules active!

Here's a WAN speed test from another machine (through OPNsense)

Code: [Select]
root@xxx:~/tmp # ./fast
 -> 984.61 Mbps
root@xxx:~/tmp # ./fast
 -> 5.55 Gbps

First is with Suricata enabled and in IPS mode (but no rules), the second is with Suricata disabled. Disappointing. But maybe this can be tuned. For now, I'm disabling Suricata.
Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: testo_cz on November 20, 2021, 05:25:59 pm
@rungekutta
Very nice info about your NIC setup. And throughput results.

Was the earlier  firmware in the NIC something like too old or it was perhaps customized ?
Because as people often reuse HW / NICs , it might not have a genuin firmware. For example customized by e.g. server vendor.

I'm only getting familiar with Suricata.... Is it like utilizing 100% CPU if enabled ?

Title: Re: Poor 10Gb performance, need help troubleshooting
Post by: rungekutta on November 21, 2021, 10:04:27 pm
Was the earlier  firmware in the NIC something like too old or it was perhaps customized ?
Because as people often reuse HW / NICs , it might not have a genuin firmware. For example customized by e.g. server vendor.

The firmware was pretty old, 1.12 something which according to release notes is from 2014. Current version is 1.26 and from 2021. So Proxmox updated this for me, but not the boot rom, which was also from 2014. So when I had managed to update that as well (with Chelsio tools in DOS, booting from a USB stick) I could pass through the card ok to the OpnSense VM.

I'm only getting familiar with Suricata.... Is it like utilizing 100% CPU if enabled ?
When I had rules enabled it pegged something like 3 or 4 cores (out of 8 available) but never got above 1Gb/s. Without any rules it used less CPU (less than 1 core) but still limited throughput to roughly 1Gb/s. I haven't looked much into tuning it, but there aren't many exposed options either via the GUI.