Hi all,
i just started with OPNsense and bought an NRG Systems IPU672 with an i5-7200U, 8 GB of RAM and i211AT NICs.
i setup 2 VLANs 192.168.1.X and 10.0.2.X and tested routing speed between both. I used IPerf3 but i can only reach speeds up to ~800 Mbit/s. Connecting both devices to a switch i reach 950 Mbit/s.
Test on OPNsense:
\iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t -R 300
Connecting to host 10.0.2.200, port 5201
[ 4] local 192.168.1.3 port 1053 connected to 10.0.2.200 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 91.0 MBytes 761 Mbits/sec
[ 4] 1.00-2.00 sec 89.0 MBytes 748 Mbits/sec
[ 4] 2.00-3.00 sec 88.9 MBytes 746 Mbits/sec
[ 4] 3.00-4.00 sec 92.1 MBytes 772 Mbits/sec
When i test with the option to do multiple connection i can reach the ~940 Mbit/s:
iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t -R 300 -P 2
Connecting to host 10.0.2.200, port 5201
[ 4] local 192.168.1.3 port 1052 connected to 10.0.2.200 port 5201
[ 6] local 192.168.1.3 port 1053 connected to 10.0.2.200 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 55.6 MBytes 466 Mbits/sec
[ 6] 0.00-1.00 sec 56.3 MBytes 472 Mbits/sec
[SUM] 0.00-1.00 sec 112 MBytes 938 Mbits/sec
this would indicate the single core performance of the CPU is to slow to handel 1 Gbit/s for a single connection however when i check top -a -H -S only one CPU is at 33%
last pid: 7932; load averages: 0.34, 0.26, 0.12 up 0+00:06:09 18:23:45
404 threads: 6 running, 383 sleeping, 15 waiting
CPU: 0.0% user, 0.0% nice, 11.1% system, 0.0% interrupt, 88.9% idle
Mem: 126M Active, 80M Inact, 494M Wired, 7112M Free
ARC: 214M Total, 32M MFU, 176M MRU, 172K Anon, 993K Header, 4640K Other
58M Compressed, 153M Uncompressed, 2.65:1 Ratio
Swap: 8192M Total, 8192M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0 64K CPU0 0 5:50 99.40% idle{idle: cpu0}
11 root 155 ki31 0 64K RUN 1 6:00 99.38% idle{idle: cpu1}
11 root 155 ki31 0 64K CPU3 3 6:00 98.63% idle{idle: cpu3}
11 root 155 ki31 0 64K CPU2 2 5:32 67.85% idle{idle: cpu2}
0 root -76 - 0 4624K CPU2 2 0:29 32.08% kernel{if_io_tqg_2}
0 root -76 - 0 4624K - 3 0:01 0.92% kernel{if_io_tqg_3}
76044 unbound 20 0 73M 39M kqread 1 0:02 0.40% unbound{unbound}
76044 unbound 20 0 73M 39M kqread 0 0:02 0.38% unbound{unbound}
0 root -76 - 0 4624K - 0 0:02 0.27% kernel{if_config_tqg_0}
12 root -72 - 0 240K WAIT 0 0:00 0.21% intr{swi1: pfsync}
8706 root 20 0 18M 6740K select 1 0:02 0.19% ntpd{ntpd}
58282 root 20 0 1045M 4788K CPU1 1 0:00 0.07% top
20 root -16 - 0 16K pftm 2 0:00 0.04% pf purge
22 root -16 - 0 16K - 3 0:00 0.04% rand_harvestq
12 root -72 - 0 240K WAIT 2 0:00 0.03% intr{swi1: netisr 0}
69837 root 20 0 31M 11M kqread 3 0:00 0.02% syslog-ng{syslog-ng}
12 root -60 - 0 240K WAIT 0 0:00 0.02% intr{swi4: clock (0)}
20757 root 20 0 24M 14M select 3 0:01 0.01% python3.8
76550 root 20 0 21M 12M select 1 0:00 0.01% python3.8
24176 root 20 0 21M 12M select 1 0:00 0.01% python3.8
33797 root 16 - 0 16K syncer 3 0:00 0.01% syncer
99582 root 20 0 20M 6448K select 0 0:00 0.01% mpd5{mpd5}
0 root -76 - 0 4624K - 1 0:00 0.00% kernel{softirq_1}
0 root -76 - 0 4624K - 0 0:00 0.00% kernel{softirq_0}
0 root -76 - 0 4624K - 3 0:00 0.00% kernel{softirq_3}
43323 root -16 - 0 48K psleep 1 0:00 0.00% pagedaemon{dom0}
75062 root 20 0 17M 7416K select 1 0:00 0.00% sshd
69837 root 20 0 31M 11M kqread 0 0:00 0.00% syslog-ng{syslog-ng}
0 root -76 - 0 4624K - 0 0:11 0.00% kernel{if_io_tqg_0}
i played around with HW Offload setting under "Interfaces > settings" and rebooted but i seem not to change anything. Is the CPU really to slow to reach 1 Gbit/s routing speed on a single connection or am i doing something wrong ?
Thanks for the Help!
I'd say, for the start , use these tunables
net.isr.maxthreads = "-1"
net.isr.bindthreads = "1"
to have multiple queues ;
And disable flow control on the Intel NICs, e.g.
dev.igb.0.fc = "0"
dev.igb.1.fc = "0"
The most 1GbE setups i've seen do the NAT close to 1Gbps happily.
Is the CPU frequency scaling up/down ?
sysctl -a | grep cpu | grep freq
Hi, thanks for your help!
So i tried with the following tunables:
For all NICs:
dev.igb.X.fc = 0
dev.igb.X.eee_control = 0
And also added:
net.isr.bindthreads = 1
net.isr.maxthreads = -1
Performance slightly increase but is still not up to line speed:
iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t 5
Connecting to host 10.0.2.200, port 5201
[ 4] local 192.168.1.100 port 29542 connected to 10.0.2.200 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 100 MBytes 838 Mbits/sec
[ 4] 1.00-2.00 sec 100 MBytes 843 Mbits/sec
[ 4] 2.00-3.00 sec 102 MBytes 854 Mbits/sec
[ 4] 3.00-4.00 sec 102 MBytes 853 Mbits/sec
[ 4] 4.00-5.00 sec 102 MBytes 853 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-5.00 sec 506 MBytes 848 Mbits/sec sender
[ 4] 0.00-5.00 sec 506 MBytes 848 Mbits/sec receiver
CPU Freq is the following:
root@OPNsense:~ # sysctl -a | grep cpu | grep freq
device cpufreq
debug.cpufreq.verbose: 0
debug.cpufreq.lowest: 0
dev.cpufreq.3.freq_driver: hwpstate_intel3
dev.cpufreq.3.%parent: cpu3
dev.cpufreq.3.%pnpinfo:
dev.cpufreq.3.%location:
dev.cpufreq.3.%driver: cpufreq
dev.cpufreq.3.%desc:
dev.cpufreq.2.freq_driver: hwpstate_intel2
dev.cpufreq.2.%parent: cpu2
dev.cpufreq.2.%pnpinfo:
dev.cpufreq.2.%location:
dev.cpufreq.2.%driver: cpufreq
dev.cpufreq.2.%desc:
dev.cpufreq.1.freq_driver: hwpstate_intel1
dev.cpufreq.1.%parent: cpu1
dev.cpufreq.1.%pnpinfo:
dev.cpufreq.1.%location:
dev.cpufreq.1.%driver: cpufreq
dev.cpufreq.1.%desc:
dev.cpufreq.0.freq_driver: hwpstate_intel0
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.0.%pnpinfo:
dev.cpufreq.0.%location:
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%desc:
dev.cpufreq.%parent:
dev.cpu.3.freq_levels: 2712/-1
dev.cpu.3.freq: 3113
dev.cpu.2.freq_levels: 2712/-1
dev.cpu.2.freq: 3113
dev.cpu.1.freq_levels: 2712/-1
dev.cpu.1.freq: 3113
dev.cpu.0.freq_levels: 2712/-1
dev.cpu.0.freq: 3113
Is there anything else i can try ?
So instead of single stream 770Mbps you are now getting 850Mbps . IMHO thats nice, not perfect but nice.
Networking tasks should now be spread over more CPU cores but still low. As the last time you have posted 33% sys CPU load at one core. IMHO healthy system. But this time it will spread over two plus cores.
I don't have benchmarks for single stream , I'm afraid. Always been using -P2 and resulting similarly to 950Mbps (either UP or DOWN load). Maybe I'll try later but can;t promise.
Maybe 850Mbps single-stream is just fine now but I'm not sure. There is always a performance decrease because of intentional non-offloading NICs and Netmap'ing within OPNsense, however the decrease is almost invisible with multiple streams/sessions in case of 1GbE. That is 950Mbps and more , in case of -P2, which is very nice considering MTUs , pps, and other limitations of source-destination NICs.
Reminds me, the OPNSense docs say , to keep TCP, UDP, LRO offloadings in the default = OFF.
And this might be useful too -- an example of "healthy" initialization of a powerful Intel 1GbE NIC (dmesg | grep igb0):
igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xc020-0xc03f mem 0xfe8a0000-0xfe8bffff,0xfe880000-0xfe89ffff,0xfe8c4000-0xfe8c7fff irq 40 at device 0.0 on pci3
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 00:25:90:00:00:00
igb0: netmap queues/slots: TX 2/1024, RX 2/1024
Important is: MSI on, more than one HW-queue, more than one netmap queues mapped in non-emulated mode.
Is the single stream TCP performance somehow crucial for you ?
TCP stack itself has got some tunables too -- both via Iperf3 tool and the kernel sysctl.
T.
Hello.
If you're using the intrusion detection service, turn it off for testing.
Also, set the following parameters to ensure no power management is messing with the CPU clock speed:
System -> Settings -> Miscellaneous -> Power Savings -> All to "Maximum" and "Use PowerD" ON.
Ensure both iperf3 server and client can handle the load on a switch first, rather testing straight from a router.
Repeat the tests with: iperf3 -c 10.0.2.200 -t 180 -P 8
For reference, I can manage 1 gigabit routing with IDS+IPS both enabled, on a i5-4690k, with > 50% CPU load on the 4 cores. So, consider upgrading the CPU from an under-powered and under-voltaged model to a more muscular one.
Best regards.
Hi,
i did some real test with FTP and SMB and throughput was usually 113 MB/s which is around 905 Mbit/s. I think that's good enough considering that there is some routing involved, so i will not further investigate.
Once i used iperf3 with "-P2" or more its always 950 Mbit/s so the "issue" anyhow was only with single stream.
Thanks for your help!