# dmesg | grep vcxl | grep netmapvcxl0: netmap queues/slots: TX 8/1023, RX 8/1024vcxl0: 8 txq, 8 rxq (NIC); 1 txq (ETHOFLD); 8 txq, 8 rxq (netmap)vcxl1: netmap queues/slots: TX 8/1023, RX 8/1024vcxl1: 8 txq, 8 rxq (NIC); 1 txq (ETHOFLD); 8 txq, 8 rxq (netmap)vcxl2: netmap queues/slots: TX 8/1023, RX 8/1024vcxl2: 8 txq, 8 rxq (NIC); 1 txq (ETHOFLD); 8 txq, 8 rxq (netmap)vcxl3: netmap queues/slots: TX 8/1023, RX 8/1024vcxl3: 8 txq, 8 rxq (NIC); 1 txq (ETHOFLD); 8 txq, 8 rxq (netmap)
root@pfw1:~ # sysctl dev.t5nex.0.firmware_versiondev.t5nex.0.firmware_version: 1.25.6.0root@pfw1:~ # sysctl dev.t5nex.1.firmware_versiondev.t5nex.1.firmware_version: 1.25.4.0
hw.cxgbe.fw_install="2"Then:root@pfw1:~ # dmesg | grep 1.23t5nex0: firmware on card (1.26.2.0) is different than the version bundled with this driver, installing firmware 1.23.0.0 on card.t5nex1: firmware on card (1.26.2.0) is different than the version bundled with this driver, installing firmware 1.23.0.0 on card.
generic_netmap_register Emulated adapter for cxgbe1 activated
Actually I've run up both TNSR and VyOS as a VMs with SR-IOV passthrough VFs from the T520-CRs. I don't have the numbers yet for VyOS, but with TNSR and a basic set of ACLs performance right out of the box is double what I was getting with FreeBSD packet filters.
Anyway, I've ordered an X710-DA2 for testing (the Intel card I previously tested with was x520). Still hoping I can do this with OPNsense as it really is a great product. But if I can't pinpoint the throughput issues I'll have to run with a Linux based distro.
iperf -c hostname -tinf -P 50
top -PCH
Would be interesting to hear your equivalent VyOS performance.
Bit of an update on this. After swapping the Chelsio cards for Intel X710-DA2s and getting more or less the same result I've figured out at least the iperf issue. iperf3 is single threaded, even if you run it with the -P option it still only hits one CPU core. If you want multithreaded operation you have to use iperf2.I'd been checking CPU utilisation on the firewall dashboard while iperf3 was running and not seeing any significant numbers, but when I checked with top directly from the console the single CPU core being hit by iperf was running at close to 100%.So I installed iperf2 and ran it multithreaded and boom! Near wire speed with 20+ concurrent threads!!Running iperf continuously makes it easier to monitor top. For those who are interested, this runs the iperf2 client continuously with 50 threads:Code: [Select]iperf -c hostname -tinf -P 50Then run top on the firewall like this:Code: [Select]top -PCHBut here's what I don't get. If I run iperf2 *through* the firewall to a server on the same 10Gbps network segment as the WAN, I get around 5Gbps with a single thread and 7-8Gbps multithreaded. But the same client running speedtest cli peaks at around 1Gbps. Looking at top on the firewall while speedtest is running doesn't show any significant CPU utilisation and anyway, if the firewall is only running a single thread for speedtest realistically it should be capable of way better than 1Gbps (half of that with WIN10!).The obvious culprit is the ISP network but I'm still getting up to 8Gbps running speedtest directly from the firewall. I've also tested with mtr (no data loss and super low latency) and tracepath (no mtu issues all the way through to 1.1.1.1).In summary, here is what I have found:There is not much difference I can tell in performance between the Intel X710-DA2 and the Chelsio T520-CRsThe internal 10Gbps network and attached clients are healthy and can transfer data at close to wire speedThe overhead from packet filtering on the firewall (passing iperf traffic) is 2-3Gbps which is bearable. Faster CPUs might reduce this, but with 10 cores engaged utilisation is only about 25-30%The ISP upstream network is healthySo I'm not sure why there is such a big difference in firewall throughput between speedtest and iperf. I'm guessing speedtest uses tcp/443 and iperf defaults tcp/5001 (5201 for iperf3).Unless the firewall is doing additional processing for tcp/443? I don't have any special rules set up for https and there is no IDS running at the moment. I'm going to have a close look at the proxy setup see if that leads anywhere.
sysctl -a | grep cpu | grep freq
net.isr.maxthreads = "-1"net.isr.bindthreads = "1"
dev.ixl.#.fc = "0"
When you mentioned "10 cores engaged utilisation is only about 25-30%" , does that mean that each of the ten CPU cores is utilized at 25-30% ?
Few shots I would check:if power management allows CPU to scale its frequency up ?Code: [Select]sysctl -a | grep cpu | grep freqif this network tunables for multicore CPUs are on: Code: [Select]net.isr.maxthreads = "-1"net.isr.bindthreads = "1"if flow control per network interface is off ?Code: [Select]dev.ixl.#.fc = "0"Further, I'd say that one may try to increase number of RX/TX queues and descriptors. If not ixl(4) then iflib(4) based tunables might let you to do so. Check the sysctl values of 'nrxqs', 'ntxqs', 'nrxds' and 'ntxds' and see if you may override them to make them bigger/larger. Overrides require reboot , I guess.Docs e.g. here: https://www.freebsd.org/cgi/man.cgi?query=iflib&sektion=4&apropos=0&manpath=FreeBSD+12.2-RELEASE+and+PortsThis approach boosted forwarding performance on my ESX setup with vmx interfaces.With regards the speedtest-cli to the Internet, I;d say to try to tcpdump/wireshark on both sides of firewall to see if the packets go nicely as expected or if there are resends, rubbish or something strange going on.