Poor routing speed on i5-7200U

Started by Sensler3000, December 25, 2021, 06:00:55 PM

Previous topic - Next topic
December 25, 2021, 06:00:55 PM Last Edit: December 25, 2021, 06:25:18 PM by Sensler3000
Hi all,

i just started with OPNsense and bought an NRG Systems IPU672 with an i5-7200U, 8 GB of RAM and i211AT NICs.

i setup 2 VLANs 192.168.1.X and 10.0.2.X and tested routing speed between both. I used IPerf3 but i can only reach speeds up to ~800 Mbit/s. Connecting both devices to a switch i reach 950 Mbit/s.

Test on OPNsense:

\iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t -R 300
Connecting to host 10.0.2.200, port 5201
[  4] local 192.168.1.3 port 1053 connected to 10.0.2.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  91.0 MBytes   761 Mbits/sec
[  4]   1.00-2.00   sec  89.0 MBytes   748 Mbits/sec
[  4]   2.00-3.00   sec  88.9 MBytes   746 Mbits/sec
[  4]   3.00-4.00   sec  92.1 MBytes   772 Mbits/sec


When i test with the option to do multiple connection i can reach the ~940 Mbit/s:

iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t -R 300 -P 2
Connecting to host 10.0.2.200, port 5201
[  4] local 192.168.1.3 port 1052 connected to 10.0.2.200 port 5201
[  6] local 192.168.1.3 port 1053 connected to 10.0.2.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  55.6 MBytes   466 Mbits/sec
[  6]   0.00-1.00   sec  56.3 MBytes   472 Mbits/sec
[SUM]   0.00-1.00   sec   112 MBytes   938 Mbits/sec


this would indicate the single core performance of the CPU is to slow to handel 1 Gbit/s for a single connection however when i check top -a -H -S only one CPU is at 33%

last pid:  7932;  load averages:  0.34,  0.26,  0.12                                                                            up 0+00:06:09  18:23:45
404 threads:   6 running, 383 sleeping, 15 waiting
CPU:  0.0% user,  0.0% nice, 11.1% system,  0.0% interrupt, 88.9% idle
Mem: 126M Active, 80M Inact, 494M Wired, 7112M Free
ARC: 214M Total, 32M MFU, 176M MRU, 172K Anon, 993K Header, 4640K Other
     58M Compressed, 153M Uncompressed, 2.65:1 Ratio
Swap: 8192M Total, 8192M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31      0    64K CPU0     0   5:50  99.40% idle{idle: cpu0}
   11 root        155 ki31      0    64K RUN      1   6:00  99.38% idle{idle: cpu1}
   11 root        155 ki31      0    64K CPU3     3   6:00  98.63% idle{idle: cpu3}
   11 root        155 ki31      0    64K CPU2     2   5:32  67.85% idle{idle: cpu2}
    0 root        -76    -      0  4624K CPU2     2   0:29  32.08% kernel{if_io_tqg_2}
    0 root        -76    -      0  4624K -        3   0:01   0.92% kernel{if_io_tqg_3}
76044 unbound      20    0    73M    39M kqread   1   0:02   0.40% unbound{unbound}
76044 unbound      20    0    73M    39M kqread   0   0:02   0.38% unbound{unbound}
    0 root        -76    -      0  4624K -        0   0:02   0.27% kernel{if_config_tqg_0}
   12 root        -72    -      0   240K WAIT     0   0:00   0.21% intr{swi1: pfsync}
8706 root         20    0    18M  6740K select   1   0:02   0.19% ntpd{ntpd}
58282 root         20    0  1045M  4788K CPU1     1   0:00   0.07% top
   20 root        -16    -      0    16K pftm     2   0:00   0.04% pf purge
   22 root        -16    -      0    16K -        3   0:00   0.04% rand_harvestq
   12 root        -72    -      0   240K WAIT     2   0:00   0.03% intr{swi1: netisr 0}
69837 root         20    0    31M    11M kqread   3   0:00   0.02% syslog-ng{syslog-ng}
   12 root        -60    -      0   240K WAIT     0   0:00   0.02% intr{swi4: clock (0)}
20757 root         20    0    24M    14M select   3   0:01   0.01% python3.8
76550 root         20    0    21M    12M select   1   0:00   0.01% python3.8
24176 root         20    0    21M    12M select   1   0:00   0.01% python3.8
33797 root         16    -      0    16K syncer   3   0:00   0.01% syncer
99582 root         20    0    20M  6448K select   0   0:00   0.01% mpd5{mpd5}
    0 root        -76    -      0  4624K -        1   0:00   0.00% kernel{softirq_1}
    0 root        -76    -      0  4624K -        0   0:00   0.00% kernel{softirq_0}
    0 root        -76    -      0  4624K -        3   0:00   0.00% kernel{softirq_3}
43323 root        -16    -      0    48K psleep   1   0:00   0.00% pagedaemon{dom0}
75062 root         20    0    17M  7416K select   1   0:00   0.00% sshd
69837 root         20    0    31M    11M kqread   0   0:00   0.00% syslog-ng{syslog-ng}
    0 root        -76    -      0  4624K -        0   0:11   0.00% kernel{if_io_tqg_0}



i played around with HW Offload setting under "Interfaces > settings" and rebooted but i seem not to change anything. Is the CPU really to slow to reach 1 Gbit/s routing speed on a single connection or am i doing something wrong ?

Thanks for the Help!

I'd say, for the start , use these tunables

net.isr.maxthreads = "-1"
net.isr.bindthreads = "1"

to have multiple queues ;
And disable flow control on the Intel NICs, e.g.

dev.igb.0.fc = "0"
dev.igb.1.fc = "0"


The most 1GbE setups i've seen do the NAT  close to 1Gbps happily.

Is the CPU frequency scaling up/down ?

sysctl -a | grep cpu | grep freq




Hi, thanks for your help!

So i tried with the following tunables:

For all NICs:
dev.igb.X.fc = 0
dev.igb.X.eee_control = 0

And also added:
net.isr.bindthreads = 1
net.isr.maxthreads = -1

Performance slightly increase but is still not up to line speed:
iperf-3.1.3-win32>iperf3 -c 10.0.2.200 -t 5
Connecting to host 10.0.2.200, port 5201
[  4] local 192.168.1.100 port 29542 connected to 10.0.2.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   100 MBytes   838 Mbits/sec
[  4]   1.00-2.00   sec   100 MBytes   843 Mbits/sec
[  4]   2.00-3.00   sec   102 MBytes   854 Mbits/sec
[  4]   3.00-4.00   sec   102 MBytes   853 Mbits/sec
[  4]   4.00-5.00   sec   102 MBytes   853 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-5.00   sec   506 MBytes   848 Mbits/sec                  sender
[  4]   0.00-5.00   sec   506 MBytes   848 Mbits/sec                  receiver


CPU Freq is the following:
root@OPNsense:~ # sysctl -a | grep cpu | grep freq
device  cpufreq
debug.cpufreq.verbose: 0
debug.cpufreq.lowest: 0
dev.cpufreq.3.freq_driver: hwpstate_intel3
dev.cpufreq.3.%parent: cpu3
dev.cpufreq.3.%pnpinfo:
dev.cpufreq.3.%location:
dev.cpufreq.3.%driver: cpufreq
dev.cpufreq.3.%desc:
dev.cpufreq.2.freq_driver: hwpstate_intel2
dev.cpufreq.2.%parent: cpu2
dev.cpufreq.2.%pnpinfo:
dev.cpufreq.2.%location:
dev.cpufreq.2.%driver: cpufreq
dev.cpufreq.2.%desc:
dev.cpufreq.1.freq_driver: hwpstate_intel1
dev.cpufreq.1.%parent: cpu1
dev.cpufreq.1.%pnpinfo:
dev.cpufreq.1.%location:
dev.cpufreq.1.%driver: cpufreq
dev.cpufreq.1.%desc:
dev.cpufreq.0.freq_driver: hwpstate_intel0
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.0.%pnpinfo:
dev.cpufreq.0.%location:
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%desc:
dev.cpufreq.%parent:
dev.cpu.3.freq_levels: 2712/-1
dev.cpu.3.freq: 3113
dev.cpu.2.freq_levels: 2712/-1
dev.cpu.2.freq: 3113
dev.cpu.1.freq_levels: 2712/-1
dev.cpu.1.freq: 3113
dev.cpu.0.freq_levels: 2712/-1
dev.cpu.0.freq: 3113


Is there anything else i can try ?


So instead of single stream 770Mbps you are now getting 850Mbps . IMHO thats nice, not perfect but nice.

Networking tasks should now be spread over more CPU cores but still low. As the last time you have posted 33% sys CPU load at one core. IMHO healthy system. But this time it will spread over two plus cores.

I don't have benchmarks for single stream , I'm afraid. Always been using -P2 and resulting similarly to 950Mbps (either UP or DOWN load). Maybe I'll try later but can;t promise.

Maybe 850Mbps single-stream is just fine now but I'm not sure. There is always a performance decrease because of intentional non-offloading NICs and Netmap'ing within OPNsense, however the decrease is almost invisible with multiple streams/sessions in case of 1GbE. That is 950Mbps and more , in case of -P2, which is very nice considering MTUs , pps, and other limitations of source-destination NICs.

Reminds me, the OPNSense docs say , to keep TCP, UDP, LRO offloadings in the default = OFF.

And this might be useful too -- an example of "healthy" initialization of a powerful Intel 1GbE NIC (dmesg | grep igb0):

igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xc020-0xc03f mem 0xfe8a0000-0xfe8bffff,0xfe880000-0xfe89ffff,0xfe8c4000-0xfe8c7fff irq 40 at device 0.0 on pci3
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 00:25:90:00:00:00
igb0: netmap queues/slots: TX 2/1024, RX 2/1024

Important is: MSI on, more than one HW-queue, more than one netmap queues mapped in non-emulated mode.

Is the single stream TCP performance somehow crucial for you ?

TCP stack itself has got some tunables too -- both via Iperf3 tool and the kernel sysctl.

T.


December 29, 2021, 08:01:04 PM #4 Last Edit: December 29, 2021, 08:04:43 PM by ReDaLeRt
Hello.

If you're using the intrusion detection service, turn it off for testing.

Also, set the following parameters to ensure no power management is messing with the CPU clock speed:

System -> Settings -> Miscellaneous -> Power Savings -> All to "Maximum" and "Use PowerD" ON.

Ensure both iperf3 server and client can handle the load on a switch first, rather testing straight from a router.

Repeat the tests with: iperf3 -c 10.0.2.200 -t 180 -P 8

For reference, I can manage 1 gigabit routing with IDS+IPS both enabled, on a i5-4690k, with > 50% CPU load on the 4 cores. So, consider upgrading the CPU from an under-powered and under-voltaged model to a more muscular one.

Best regards.

December 29, 2021, 08:10:36 PM #5 Last Edit: December 29, 2021, 08:15:26 PM by Sensler3000
Hi,

i did some real test with FTP and SMB and throughput was usually 113 MB/s which is around 905 Mbit/s. I think that's good enough considering that there is some routing involved, so i will not further investigate.

Once i used iperf3 with "-P2" or more its always 950 Mbit/s so the "issue" anyhow was only with single stream.

Thanks for your help!