APU2D4 very low throughput 1Gbit

Started by Burschi, May 24, 2021, 03:26:16 PM

Previous topic - Next topic
Hello everybody,

it seems i have a problem with the throughput of opnsense on my apu2d4. Using iperf3 i only get about 200 Mbits/sec between Interfaces:

iperf3 -V -f m -c 192.168.20.237
[  5] local 192.168.30.220 port 34084 connected to 192.168.20.237 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  27.1 MBytes   227 Mbits/sec    0   1.22 MBytes       
[  5]   1.00-2.00   sec  23.8 MBytes   199 Mbits/sec   57   1.13 MBytes       
[  5]   2.00-3.00   sec  23.8 MBytes   199 Mbits/sec    0   1.24 MBytes       
[  5]   3.00-4.00   sec  23.8 MBytes   199 Mbits/sec    0   1.33 MBytes       
[  5]   4.00-5.00   sec  23.8 MBytes   199 Mbits/sec    0   1.39 MBytes       
[  5]   5.00-6.00   sec  23.8 MBytes   199 Mbits/sec    4   1.01 MBytes       
[  5]   6.00-7.00   sec  23.8 MBytes   199 Mbits/sec    0   1.08 MBytes       
[  5]   7.00-8.00   sec  23.8 MBytes   199 Mbits/sec    0   1.14 MBytes       
[  5]   8.00-9.00   sec  23.8 MBytes   199 Mbits/sec    0   1.17 MBytes       
[  5]   9.00-10.00  sec  23.8 MBytes   199 Mbits/sec    0   1.20 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   241 MBytes   202 Mbits/sec   61             sender
[  5]   0.00-10.07  sec   238 MBytes   198 Mbits/sec                  receiver
CPU Utilization: local/sender 1.6% (0.0%u/1.5%s), remote/receiver 21.5% (1.3%u/20.2%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic


Where 192.168.30.220 is a VLAN on igb2 and 192.168.20.237 is a lxc on physical igb1. Even when using doing the test from 192.168.30.220 to the lxc host (proxmox/Debian, bare metal) i only get ~220 Mbits/sec:

iperf3 -V -f m -c 192.168.20.230
[  5] local 192.168.30.220 port 55432 connected to 192.168.20.230 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  29.8 MBytes   250 Mbits/sec    0   1.33 MBytes       
[  5]   1.00-2.00   sec  26.2 MBytes   220 Mbits/sec   60   1.15 MBytes       
[  5]   2.00-3.00   sec  26.2 MBytes   220 Mbits/sec    0   1.26 MBytes       
[  5]   3.00-4.00   sec  25.0 MBytes   210 Mbits/sec    0   1.35 MBytes       
[  5]   4.00-5.00   sec  26.2 MBytes   220 Mbits/sec    0   1.41 MBytes       
[  5]   5.00-6.00   sec  26.2 MBytes   220 Mbits/sec    2   1.04 MBytes       
[  5]   6.00-7.00   sec  26.2 MBytes   220 Mbits/sec    0   1.11 MBytes       
[  5]   7.00-8.00   sec  25.0 MBytes   210 Mbits/sec    0   1.15 MBytes       
[  5]   8.00-9.00   sec  26.2 MBytes   220 Mbits/sec    0   1.18 MBytes       
[  5]   9.00-10.00  sec  26.2 MBytes   220 Mbits/sec    0   1.20 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   264 MBytes   221 Mbits/sec   62             sender
[  5]   0.00-10.07  sec   260 MBytes   217 Mbits/sec                  receiver
CPU Utilization: local/sender 1.6% (0.1%u/1.5%s), remote/receiver 16.7% (1.6%u/15.1%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic


The same is true if i go from one VLAN to VLAN which are located at the same physical interface.

As said before im using a apu2d4, suricata disabled, hardware offloading enabled and have a cpu usage of 70-100%. I installed the latest ROM and tried the hints from https://teklager.se/en/knowledge-base/opnsense-performance-optimization/, but with no success. I know there is an issue open in thread https://forum.opnsense.org/index.php?topic=18754.0, but the reported speeds there are for 10GBit and far higher than mine. So im not sure if im suffering from the same bug, or if im just experiencing the results of bad configuration...

Any help would be appreciated!

I don't have one of these little devices so I can't provide any direct help. I'm sure you checked this but just in case, there is no traffic shaping in place during the iperf tests, correct?

I noticed in the article you linked that the author there is using two threads in parallel to achieve a higher transfer speed. If you increase the thread count (-P 2) do you see a commensurate increase in transfer speed?

May 26, 2021, 06:24:48 PM #2 Last Edit: May 26, 2021, 06:32:03 PM by Burschi
No for both suggestions - i dont have traffic shaping active on this VLAN, and with -P 2 i get about the same transfer rate...

Edit:
Hm, when disabling traffic shaping on the other VLAN it seems to get a bit better. Could this be cpu related? But then i was under the impression that the apu2d4 should be able to route 1Gbit...

May 26, 2021, 07:57:22 PM #3 Last Edit: May 26, 2021, 07:59:56 PM by almodovaris
See https://teklager.se/en/knowledge-base/opnsense-performance-optimization/

You have to edit /boot/loader.conf.local and also set up as parameters through the GUI.

E.g. my file is:


#cpu_microcode_load="YES"
#cpu_microcode_name="/boot/firmware/intel-ucode.bin"
# agree with Intel license terms
amdtemp_load="YES"
ahci_load="YES"
aesni_load="YES"
if_igb_load="YES"
flowd_enable="YES"
flowd_aggregate_enable="YES"
legal.intel_igb.license_ack="1"
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1
# this is the magic. If you don't set this, queues won't be utilized properly
# allow multiple processes for receive/transmit processing
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"
# more settings to play with below. Not strictly necessary.
# force NIC to use 1 queue (don't do it on APU)
# hw.igb.num_queues=1
# give enough RAM to network buffers (default is usually OK)
#kern.ipc.nmbclusters="1000000"
net.pf.states_hashsize=2097152
#hw.igb.rxd=4096
#hw.igb.txd=4096
#net.inet.tcp.syncache.hashsize="1024"
#net.inet.tcp.syncache.bucketlimit="100"
#kern.smp.disabled=1
#hw.igb.0.fc=3
#hw.igb.1.fc=3
#hw.igb.2.fc=3
hw.igb.num_queues=0
#net.link.ifqmaxlen="8192"
hw.igb.enable_aim=1
#hw.igb.max_interrupt_rate="64000"
hw.igb.enable_msix=1
hw.pci.enable_msix=1
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"
#net.inet.ip.maxfragpackets="0"
#net.inet.ip.maxfragsperpacket="0"
#dev.igb.0.eee_disabled="1"
#dev.igb.1.eee_disabled="1"
#dev.igb.2.eee_disabled="1"
vm.pmap.pti = 0
hw.ibrs_disable = 0
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1
hint.acpi_perf.0.disabled=1
hint.p4tcc.1.disabled=1
hint.acpi_throttle.1.disabled=1
hint.acpi_perf.1.disabled=1
hint.p4tcc.2.disabled=1
hint.acpi_throttle.2.disabled=1
hint.acpi_perf.2.disabled=1
hint.p4tcc.3.disabled=1
hint.acpi_throttle.3.disabled=1
hint.acpi_perf.3.disabled=1
OPNsense HW:

Minisforum Venus series UN100C, 16 GB RAM, 512 GB SSD
T-bao N9N Pro, 16 GB RAM, 512 GB SSD

Oh, yes, you may try with:

hw.ibrs_disable = 1
OPNsense HW:

Minisforum Venus series UN100C, 16 GB RAM, 512 GB SSD
T-bao N9N Pro, 16 GB RAM, 512 GB SSD

I'd take Teklager's advice regarding fine-tuning OPNsense and APU2x4 to 1Gbit throughput with a grain of salt, since it's never been updated. On pfSense, however the advice has been upgrade thrice.

The OPNsense advise is not dated, but from a screenshot you can deduce that it's from 2019. The pfSense advice is dated, with the original post from 2019-01-15 with there subsequent updates, 2020-07-19, 2020-10-28 and 2021-02-20

https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/

"Gigabit config for pfSense 2.5.0. No tweaks are required! Don't follow any of the information listed below for pfSense 2.4.5."

If I'm not misinformed both OPNsense 21.1 and pfSense 2.5.0 are based on FreeBSD 12.2. Tuning these two system should consequently be pretty similar.

miroco


Quote from: almodovaris on May 26, 2021, 07:57:22 PM
See https://teklager.se/en/knowledge-base/opnsense-performance-optimization/

You have to edit /boot/loader.conf.local and also set up as parameters through the GUI.

E.g. my file is:
<snip>

I tried the teklager advices already (and reverted since i saw no improvement), and taking mirocos remark into consideration i think these /might/ be outdated. Anyways - thanks for the hints.

Anything else i can try, or is it really limited (cant believe this...)

Cheers, Burschi

May 29, 2021, 05:08:55 PM #7 Last Edit: May 29, 2021, 05:38:50 PM by dave
If you're WAN interface is PPPOE based, that'll cause issues; BSD's PPPOE daemon is single threaded unfortunately.

I use an APU2C4 with the following settings added via System > Settings > Tunables:

hw.igb.rx_process_limit  -1
hw.igb.tx_process_limit  -1
legal.intel_igb.license_ack  1
hint.acpi_perf.0.disabled 1
hint.acpi_throttle.0.disabled 1
hint.p4tcc.0.disabled 1

Also, update to the latest APU BIOS, reboot your router and via serial enter BIOS config and check Core Performance is enable (which it should be by default).

Next go to System > Settings > Miscellaneous and disable PowerD (you might need to reboot again).

This will disable all throttling and lock the CPU at 1.4GHZ, which can be confirmed with sysctl dev.cpu.0.freq.
The CPU's a SOC so I'm not worried about heat or electricity.

This will disable all throttling and lock the CPU at 1.4GHZ

-> This is plain incorrect. The CPU is only 1.0Ghz fast, and core performance boost can increase only 1 core, and this one only up to max 1.4Ghz, and only for a short moment of time. Then it will get back to 1.0Ghz.
Throttling can decrease the clockspeed of all cores down to 800Mhz or the minimum 600Mhz.

May 29, 2021, 05:50:27 PM #9 Last Edit: May 29, 2021, 06:10:22 PM by dave
Maybe I'm not stressing my CPU for long enough periods.  I did think it was 1.4Gghz across all cores.

Either way you're going from 600/800/1000 to 1000/1200/1400.

Whether or not you disable throttling if up to you I guess, but since the CPU's a 7 to 12 watt device I just don't see the point in not locking it.

https://blog.3mdeb.com/2019/2019-02-14-enabling-cpb-on-pcengines-apu2/

sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1400


Doesn't seam to matter when\how frequently\under what conditions, 1.4Ghz is always reported.

Quote from: Ricardo on May 29, 2021, 05:15:50 PM
core performance boost can increase only 1 core, and this one only up to max 1.4Ghz, and only for a short moment of time. Then it will get back to 1.0Ghz.
Is this documented somewhere? Thanks.

May 30, 2021, 03:42:26 PM #11 Last Edit: May 30, 2021, 03:52:52 PM by hushcoden
Quote from: dave on May 29, 2021, 05:08:55 PM
hint.acpi_perf.0.disabled 1
hint.acpi_throttle.0.disabled 1
hint.p4tcc.0.disabled 1
How can I check the current status of those settings? I read somewhere that starting from FreeBSD 10 both ACPI and P4TCC throttling are disabled by default in new installations, is it true ?

Do you know where I can find the documentation about these specifics ACPI settings ?

Tia.


Quote from: miroco on May 30, 2021, 06:57:14 PM
APU Core Performance Boost

https://github.com/pcengines/apu2-documentation/blob/master/docs/apu_CPU_boost.md

miroco
So, with the latest firmwares, the CPU is indeed capable of running at 1.4 GHz but only when the CPU load goes up and then the clock speed goes back to 1 GHz (or lower).

And the only purpose of applying the following hacks
hint.acpi_perf.0.disabled 1
hint.acpi_throttle.0.disabled 1
hint.p4tcc.0.disabled 1

is to keep the CPU clock fixed at 1.4 GHz (no throttling) regardless of the load, is that correct?

They only affect the displaying of the core performance boost. The core performance boost will work the same without them (but it will display 1000 even when it's at 1400).