OPNsense Forum

English Forums => Hardware and Performance => Topic started by: Sensler3000 on December 25, 2021, 06:00:55 PM

Title: Poor routing speed on i5-7200U
Post by: Sensler3000 on December 25, 2021, 06:00:55 PM
Hi all,

i just started with OPNsense and bought an NRG Systems IPU672 with an i5-7200U, 8 GB of RAM and i211AT NICs.

i setup 2 VLANs 192.168.1.X and 10.0.2.X and tested routing speed between both. I used IPerf3 but i can only reach speeds up to ~800 Mbit/s. Connecting both devices to a switch i reach 950 Mbit/s.

Test on OPNsense:

\iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t -R 300
Connecting to host 10.0.2.200, port 5201
[  4] local 192.168.1.3 port 1053 connected to 10.0.2.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  91.0 MBytes   761 Mbits/sec
[  4]   1.00-2.00   sec  89.0 MBytes   748 Mbits/sec
[  4]   2.00-3.00   sec  88.9 MBytes   746 Mbits/sec
[  4]   3.00-4.00   sec  92.1 MBytes   772 Mbits/sec


When i test with the option to do multiple connection i can reach the ~940 Mbit/s:

iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t -R 300 -P 2
Connecting to host 10.0.2.200, port 5201
[  4] local 192.168.1.3 port 1052 connected to 10.0.2.200 port 5201
[  6] local 192.168.1.3 port 1053 connected to 10.0.2.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  55.6 MBytes   466 Mbits/sec
[  6]   0.00-1.00   sec  56.3 MBytes   472 Mbits/sec
[SUM]   0.00-1.00   sec   112 MBytes   938 Mbits/sec


this would indicate the single core performance of the CPU is to slow to handel 1 Gbit/s for a single connection however when i check top -a -H -S only one CPU is at 33%

last pid:  7932;  load averages:  0.34,  0.26,  0.12                                                                            up 0+00:06:09  18:23:45
404 threads:   6 running, 383 sleeping, 15 waiting
CPU:  0.0% user,  0.0% nice, 11.1% system,  0.0% interrupt, 88.9% idle
Mem: 126M Active, 80M Inact, 494M Wired, 7112M Free
ARC: 214M Total, 32M MFU, 176M MRU, 172K Anon, 993K Header, 4640K Other
     58M Compressed, 153M Uncompressed, 2.65:1 Ratio
Swap: 8192M Total, 8192M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31      0    64K CPU0     0   5:50  99.40% idle{idle: cpu0}
   11 root        155 ki31      0    64K RUN      1   6:00  99.38% idle{idle: cpu1}
   11 root        155 ki31      0    64K CPU3     3   6:00  98.63% idle{idle: cpu3}
   11 root        155 ki31      0    64K CPU2     2   5:32  67.85% idle{idle: cpu2}
    0 root        -76    -      0  4624K CPU2     2   0:29  32.08% kernel{if_io_tqg_2}
    0 root        -76    -      0  4624K -        3   0:01   0.92% kernel{if_io_tqg_3}
76044 unbound      20    0    73M    39M kqread   1   0:02   0.40% unbound{unbound}
76044 unbound      20    0    73M    39M kqread   0   0:02   0.38% unbound{unbound}
    0 root        -76    -      0  4624K -        0   0:02   0.27% kernel{if_config_tqg_0}
   12 root        -72    -      0   240K WAIT     0   0:00   0.21% intr{swi1: pfsync}
8706 root         20    0    18M  6740K select   1   0:02   0.19% ntpd{ntpd}
58282 root         20    0  1045M  4788K CPU1     1   0:00   0.07% top
   20 root        -16    -      0    16K pftm     2   0:00   0.04% pf purge
   22 root        -16    -      0    16K -        3   0:00   0.04% rand_harvestq
   12 root        -72    -      0   240K WAIT     2   0:00   0.03% intr{swi1: netisr 0}
69837 root         20    0    31M    11M kqread   3   0:00   0.02% syslog-ng{syslog-ng}
   12 root        -60    -      0   240K WAIT     0   0:00   0.02% intr{swi4: clock (0)}
20757 root         20    0    24M    14M select   3   0:01   0.01% python3.8
76550 root         20    0    21M    12M select   1   0:00   0.01% python3.8
24176 root         20    0    21M    12M select   1   0:00   0.01% python3.8
33797 root         16    -      0    16K syncer   3   0:00   0.01% syncer
99582 root         20    0    20M  6448K select   0   0:00   0.01% mpd5{mpd5}
    0 root        -76    -      0  4624K -        1   0:00   0.00% kernel{softirq_1}
    0 root        -76    -      0  4624K -        0   0:00   0.00% kernel{softirq_0}
    0 root        -76    -      0  4624K -        3   0:00   0.00% kernel{softirq_3}
43323 root        -16    -      0    48K psleep   1   0:00   0.00% pagedaemon{dom0}
75062 root         20    0    17M  7416K select   1   0:00   0.00% sshd
69837 root         20    0    31M    11M kqread   0   0:00   0.00% syslog-ng{syslog-ng}
    0 root        -76    -      0  4624K -        0   0:11   0.00% kernel{if_io_tqg_0}



i played around with HW Offload setting under "Interfaces > settings" and rebooted but i seem not to change anything. Is the CPU really to slow to reach 1 Gbit/s routing speed on a single connection or am i doing something wrong ?

Thanks for the Help!
Title: Re: Poor routing speed on i5-7200U
Post by: testo_cz on December 26, 2021, 09:41:39 AM
I'd say, for the start , use these tunables

net.isr.maxthreads = "-1"
net.isr.bindthreads = "1"

to have multiple queues ;
And disable flow control on the Intel NICs, e.g.

dev.igb.0.fc = "0"
dev.igb.1.fc = "0"


The most 1GbE setups i've seen do the NAT  close to 1Gbps happily.

Is the CPU frequency scaling up/down ?

sysctl -a | grep cpu | grep freq



Title: Re: Poor routing speed on i5-7200U
Post by: Sensler3000 on December 26, 2021, 12:42:59 PM
Hi, thanks for your help!

So i tried with the following tunables:

For all NICs:
dev.igb.X.fc = 0
dev.igb.X.eee_control = 0

And also added:
net.isr.bindthreads = 1
net.isr.maxthreads = -1

Performance slightly increase but is still not up to line speed:
iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t 5
Connecting to host 10.0.2.200, port 5201
[  4] local 192.168.1.100 port 29542 connected to 10.0.2.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   100 MBytes   838 Mbits/sec
[  4]   1.00-2.00   sec   100 MBytes   843 Mbits/sec
[  4]   2.00-3.00   sec   102 MBytes   854 Mbits/sec
[  4]   3.00-4.00   sec   102 MBytes   853 Mbits/sec
[  4]   4.00-5.00   sec   102 MBytes   853 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-5.00   sec   506 MBytes   848 Mbits/sec                  sender
[  4]   0.00-5.00   sec   506 MBytes   848 Mbits/sec                  receiver


CPU Freq is the following:
root@OPNsense:~ # sysctl -a | grep cpu | grep freq
device  cpufreq
debug.cpufreq.verbose: 0
debug.cpufreq.lowest: 0
dev.cpufreq.3.freq_driver: hwpstate_intel3
dev.cpufreq.3.%parent: cpu3
dev.cpufreq.3.%pnpinfo:
dev.cpufreq.3.%location:
dev.cpufreq.3.%driver: cpufreq
dev.cpufreq.3.%desc:
dev.cpufreq.2.freq_driver: hwpstate_intel2
dev.cpufreq.2.%parent: cpu2
dev.cpufreq.2.%pnpinfo:
dev.cpufreq.2.%location:
dev.cpufreq.2.%driver: cpufreq
dev.cpufreq.2.%desc:
dev.cpufreq.1.freq_driver: hwpstate_intel1
dev.cpufreq.1.%parent: cpu1
dev.cpufreq.1.%pnpinfo:
dev.cpufreq.1.%location:
dev.cpufreq.1.%driver: cpufreq
dev.cpufreq.1.%desc:
dev.cpufreq.0.freq_driver: hwpstate_intel0
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.0.%pnpinfo:
dev.cpufreq.0.%location:
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%desc:
dev.cpufreq.%parent:
dev.cpu.3.freq_levels: 2712/-1
dev.cpu.3.freq: 3113
dev.cpu.2.freq_levels: 2712/-1
dev.cpu.2.freq: 3113
dev.cpu.1.freq_levels: 2712/-1
dev.cpu.1.freq: 3113
dev.cpu.0.freq_levels: 2712/-1
dev.cpu.0.freq: 3113


Is there anything else i can try ?

Title: Re: Poor routing speed on i5-7200U
Post by: testo_cz on December 29, 2021, 12:09:34 PM
So instead of single stream 770Mbps you are now getting 850Mbps . IMHO thats nice, not perfect but nice.

Networking tasks should now be spread over more CPU cores but still low. As the last time you have posted 33% sys CPU load at one core. IMHO healthy system. But this time it will spread over two plus cores.

I don't have benchmarks for single stream , I'm afraid. Always been using -P2 and resulting similarly to 950Mbps (either UP or DOWN load). Maybe I'll try later but can;t promise.

Maybe 850Mbps single-stream is just fine now but I'm not sure. There is always a performance decrease because of intentional non-offloading NICs and Netmap'ing within OPNsense, however the decrease is almost invisible with multiple streams/sessions in case of 1GbE. That is 950Mbps and more , in case of -P2, which is very nice considering MTUs , pps, and other limitations of source-destination NICs.

Reminds me, the OPNSense docs say , to keep TCP, UDP, LRO offloadings in the default = OFF.

And this might be useful too -- an example of "healthy" initialization of a powerful Intel 1GbE NIC (dmesg | grep igb0):

igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xc020-0xc03f mem 0xfe8a0000-0xfe8bffff,0xfe880000-0xfe89ffff,0xfe8c4000-0xfe8c7fff irq 40 at device 0.0 on pci3
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 00:25:90:00:00:00
igb0: netmap queues/slots: TX 2/1024, RX 2/1024

Important is: MSI on, more than one HW-queue, more than one netmap queues mapped in non-emulated mode.

Is the single stream TCP performance somehow crucial for you ?

TCP stack itself has got some tunables too -- both via Iperf3 tool and the kernel sysctl.

T.

Title: Re: Poor routing speed on i5-7200U
Post by: ReDaLeRt on December 29, 2021, 08:01:04 PM
Hello.

If you're using the intrusion detection service, turn it off for testing.

Also, set the following parameters to ensure no power management is messing with the CPU clock speed:

System -> Settings -> Miscellaneous -> Power Savings -> All to "Maximum" and "Use PowerD" ON.

Ensure both iperf3 server and client can handle the load on a switch first, rather testing straight from a router.

Repeat the tests with: iperf3 -c 10.0.2.200 -t 180 -P 8

For reference, I can manage 1 gigabit routing with IDS+IPS both enabled, on a i5-4690k, with > 50% CPU load on the 4 cores. So, consider upgrading the CPU from an under-powered and under-voltaged model to a more muscular one.

Best regards.
Title: Re: Poor routing speed on i5-7200U
Post by: Sensler3000 on December 29, 2021, 08:10:36 PM
Hi,

i did some real test with FTP and SMB and throughput was usually 113 MB/s which is around 905 Mbit/s. I think that's good enough considering that there is some routing involved, so i will not further investigate.

Once i used iperf3 with "-P2" or more its always 950 Mbit/s so the "issue" anyhow was only with single stream.

Thanks for your help!