OPNsense Forum

English Forums => Hardware and Performance => Topic started by: Burschi on May 24, 2021, 03:26:16 pm

Title: APU2D4 very low throughput 1Gbit
Post by: Burschi on May 24, 2021, 03:26:16 pm
Hello everybody,

it seems i have a problem with the throughput of opnsense on my apu2d4. Using iperf3 i only get about 200 Mbits/sec between Interfaces:
Code: [Select]
iperf3 -V -f m -c 192.168.20.237
[  5] local 192.168.30.220 port 34084 connected to 192.168.20.237 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  27.1 MBytes   227 Mbits/sec    0   1.22 MBytes       
[  5]   1.00-2.00   sec  23.8 MBytes   199 Mbits/sec   57   1.13 MBytes       
[  5]   2.00-3.00   sec  23.8 MBytes   199 Mbits/sec    0   1.24 MBytes       
[  5]   3.00-4.00   sec  23.8 MBytes   199 Mbits/sec    0   1.33 MBytes       
[  5]   4.00-5.00   sec  23.8 MBytes   199 Mbits/sec    0   1.39 MBytes       
[  5]   5.00-6.00   sec  23.8 MBytes   199 Mbits/sec    4   1.01 MBytes       
[  5]   6.00-7.00   sec  23.8 MBytes   199 Mbits/sec    0   1.08 MBytes       
[  5]   7.00-8.00   sec  23.8 MBytes   199 Mbits/sec    0   1.14 MBytes       
[  5]   8.00-9.00   sec  23.8 MBytes   199 Mbits/sec    0   1.17 MBytes       
[  5]   9.00-10.00  sec  23.8 MBytes   199 Mbits/sec    0   1.20 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   241 MBytes   202 Mbits/sec   61             sender
[  5]   0.00-10.07  sec   238 MBytes   198 Mbits/sec                  receiver
CPU Utilization: local/sender 1.6% (0.0%u/1.5%s), remote/receiver 21.5% (1.3%u/20.2%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic

Where 192.168.30.220 is a VLAN on igb2 and 192.168.20.237 is a lxc on physical igb1. Even when using doing the test from 192.168.30.220 to the lxc host (proxmox/Debian, bare metal) i only get ~220 Mbits/sec:
Code: [Select]
iperf3 -V -f m -c 192.168.20.230
[  5] local 192.168.30.220 port 55432 connected to 192.168.20.230 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  29.8 MBytes   250 Mbits/sec    0   1.33 MBytes       
[  5]   1.00-2.00   sec  26.2 MBytes   220 Mbits/sec   60   1.15 MBytes       
[  5]   2.00-3.00   sec  26.2 MBytes   220 Mbits/sec    0   1.26 MBytes       
[  5]   3.00-4.00   sec  25.0 MBytes   210 Mbits/sec    0   1.35 MBytes       
[  5]   4.00-5.00   sec  26.2 MBytes   220 Mbits/sec    0   1.41 MBytes       
[  5]   5.00-6.00   sec  26.2 MBytes   220 Mbits/sec    2   1.04 MBytes       
[  5]   6.00-7.00   sec  26.2 MBytes   220 Mbits/sec    0   1.11 MBytes       
[  5]   7.00-8.00   sec  25.0 MBytes   210 Mbits/sec    0   1.15 MBytes       
[  5]   8.00-9.00   sec  26.2 MBytes   220 Mbits/sec    0   1.18 MBytes       
[  5]   9.00-10.00  sec  26.2 MBytes   220 Mbits/sec    0   1.20 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   264 MBytes   221 Mbits/sec   62             sender
[  5]   0.00-10.07  sec   260 MBytes   217 Mbits/sec                  receiver
CPU Utilization: local/sender 1.6% (0.1%u/1.5%s), remote/receiver 16.7% (1.6%u/15.1%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic

The same is true if i go from one VLAN to VLAN which are located at the same physical interface.

As said before im using a apu2d4, suricata disabled, hardware offloading enabled and have a cpu usage of 70-100%. I installed the latest ROM and tried the hints from https://teklager.se/en/knowledge-base/opnsense-performance-optimization/ (https://teklager.se/en/knowledge-base/opnsense-performance-optimization/), but with no success. I know there is an issue open in thread https://forum.opnsense.org/index.php?topic=18754.0 (https://forum.opnsense.org/index.php?topic=18754.0), but the reported speeds there are for 10GBit and far higher than mine. So im not sure if im suffering from the same bug, or if im just experiencing the results of bad configuration...

Any help would be appreciated!
Title: Re: APU2D4 very low throughput 1Gbit
Post by: opnfwb on May 26, 2021, 03:26:32 pm
I don't have one of these little devices so I can't provide any direct help. I'm sure you checked this but just in case, there is no traffic shaping in place during the iperf tests, correct?

I noticed in the article you linked that the author there is using two threads in parallel to achieve a higher transfer speed. If you increase the thread count (-P 2) do you see a commensurate increase in transfer speed?
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Burschi on May 26, 2021, 06:24:48 pm
No for both suggestions - i dont have traffic shaping active on this VLAN, and with -P 2 i get about the same transfer rate...

Edit:
Hm, when disabling traffic shaping on the other VLAN it seems to get a bit better. Could this be cpu related? But then i was under the impression that the apu2d4 should be able to route 1Gbit...
Title: Re: APU2D4 very low throughput 1Gbit
Post by: almodovaris on May 26, 2021, 07:57:22 pm
See https://teklager.se/en/knowledge-base/opnsense-performance-optimization/

You have to edit /boot/loader.conf.local and also set up as parameters through the GUI.

E.g. my file is:

Code: [Select]
#cpu_microcode_load="YES"
#cpu_microcode_name="/boot/firmware/intel-ucode.bin"
# agree with Intel license terms
amdtemp_load="YES"
ahci_load="YES"
aesni_load="YES"
if_igb_load="YES"
flowd_enable="YES"
flowd_aggregate_enable="YES"
legal.intel_igb.license_ack="1"
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1
# this is the magic. If you don't set this, queues won't be utilized properly
# allow multiple processes for receive/transmit processing
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"
# more settings to play with below. Not strictly necessary.
# force NIC to use 1 queue (don't do it on APU)
# hw.igb.num_queues=1
# give enough RAM to network buffers (default is usually OK)
#kern.ipc.nmbclusters="1000000"
net.pf.states_hashsize=2097152
#hw.igb.rxd=4096
#hw.igb.txd=4096
#net.inet.tcp.syncache.hashsize="1024"
#net.inet.tcp.syncache.bucketlimit="100"
#kern.smp.disabled=1
#hw.igb.0.fc=3
#hw.igb.1.fc=3
#hw.igb.2.fc=3
hw.igb.num_queues=0
#net.link.ifqmaxlen="8192"
hw.igb.enable_aim=1
#hw.igb.max_interrupt_rate="64000"
hw.igb.enable_msix=1
hw.pci.enable_msix=1
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"
#net.inet.ip.maxfragpackets="0"
#net.inet.ip.maxfragsperpacket="0"
#dev.igb.0.eee_disabled="1"
#dev.igb.1.eee_disabled="1"
#dev.igb.2.eee_disabled="1"
vm.pmap.pti = 0
hw.ibrs_disable = 0
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1
hint.acpi_perf.0.disabled=1
hint.p4tcc.1.disabled=1
hint.acpi_throttle.1.disabled=1
hint.acpi_perf.1.disabled=1
hint.p4tcc.2.disabled=1
hint.acpi_throttle.2.disabled=1
hint.acpi_perf.2.disabled=1
hint.p4tcc.3.disabled=1
hint.acpi_throttle.3.disabled=1
hint.acpi_perf.3.disabled=1
Title: Re: APU2D4 very low throughput 1Gbit
Post by: almodovaris on May 27, 2021, 11:11:22 pm
Oh, yes, you may try with:
Code: [Select]
hw.ibrs_disable = 1
Title: Re: APU2D4 very low throughput 1Gbit
Post by: miroco on May 28, 2021, 01:56:04 am
I'd take Teklager's advice regarding fine-tuning OPNsense and APU2x4 to 1Gbit throughput with a grain of salt, since it's never been updated. On pfSense, however the advice has been upgrade thrice.

The OPNsense advise is not dated, but from a screenshot you can deduce that it's from 2019. The pfSense advice is dated, with the original post from 2019-01-15 with there subsequent updates, 2020-07-19, 2020-10-28 and 2021-02-20

https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/

"Gigabit config for pfSense 2.5.0. No tweaks are required! Don't follow any of the information listed below for pfSense 2.4.5."

If I'm not misinformed both OPNsense 21.1 and pfSense 2.5.0 are based on FreeBSD 12.2. Tuning these two system should consequently be pretty similar.

miroco
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Burschi on May 28, 2021, 09:57:41 am

See https://teklager.se/en/knowledge-base/opnsense-performance-optimization/

You have to edit /boot/loader.conf.local and also set up as parameters through the GUI.

E.g. my file is:
<snip>

I tried the teklager advices already (and reverted since i saw no improvement), and taking mirocos remark into consideration i think these /might/ be outdated. Anyways - thanks for the hints.

Anything else i can try, or is it really limited (cant believe this...)

Cheers, Burschi
Title: Re: APU2D4 very low throughput 1Gbit
Post by: dave on May 29, 2021, 05:08:55 pm
If you're WAN interface is PPPOE based, that'll cause issues; BSD's PPPOE daemon is single threaded unfortunately.

I use an APU2C4 with the following settings added via System > Settings > Tunables:

hw.igb.rx_process_limit  -1
hw.igb.tx_process_limit  -1
legal.intel_igb.license_ack  1
hint.acpi_perf.0.disabled 1
hint.acpi_throttle.0.disabled 1
hint.p4tcc.0.disabled 1

Also, update to the latest APU BIOS, reboot your router and via serial enter BIOS config and check Core Performance is enable (which it should be by default).

Next go to System > Settings > Miscellaneous and disable PowerD (you might need to reboot again).

This will disable all throttling and lock the CPU at 1.4GHZ, which can be confirmed with sysctl dev.cpu.0.freq.
The CPU's a SOC so I'm not worried about heat or electricity.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Ricardo on May 29, 2021, 05:15:50 pm
This will disable all throttling and lock the CPU at 1.4GHZ

-> This is plain incorrect. The CPU is only 1.0Ghz fast, and core performance boost can increase only 1 core, and this one only up to max 1.4Ghz, and only for a short moment of time. Then it will get back to 1.0Ghz.
Throttling can decrease the clockspeed of all cores down to 800Mhz or the minimum 600Mhz.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: dave on May 29, 2021, 05:50:27 pm
Maybe I'm not stressing my CPU for long enough periods.  I did think it was 1.4Gghz across all cores.

Either way you're going from 600/800/1000 to 1000/1200/1400.

Whether or not you disable throttling if up to you I guess, but since the CPU's a 7 to 12 watt device I just don't see the point in not locking it.

https://blog.3mdeb.com/2019/2019-02-14-enabling-cpb-on-pcengines-apu2/

Code: [Select]
sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1400

Doesn't seam to matter when\how frequently\under what conditions, 1.4Ghz is always reported.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: hushcoden on May 30, 2021, 03:31:59 pm
core performance boost can increase only 1 core, and this one only up to max 1.4Ghz, and only for a short moment of time. Then it will get back to 1.0Ghz.
Is this documented somewhere? Thanks.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: hushcoden on May 30, 2021, 03:42:26 pm
hint.acpi_perf.0.disabled 1
hint.acpi_throttle.0.disabled 1
hint.p4tcc.0.disabled 1
How can I check the current status of those settings? I read somewhere that starting from FreeBSD 10 both ACPI and P4TCC throttling are disabled by default in new installations, is it true ?

Do you know where I can find the documentation about these specifics ACPI settings ?

Tia.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: miroco on May 30, 2021, 06:57:14 pm
APU Core Performance Boost

https://github.com/pcengines/apu2-documentation/blob/master/docs/apu_CPU_boost.md

miroco
Title: Re: APU2D4 very low throughput 1Gbit
Post by: hushcoden on May 31, 2021, 01:33:09 pm
APU Core Performance Boost

https://github.com/pcengines/apu2-documentation/blob/master/docs/apu_CPU_boost.md

miroco
So, with the latest firmwares, the CPU is indeed capable of running at 1.4 GHz but only when the CPU load goes up and then the clock speed goes back to 1 GHz (or lower).

And the only purpose of applying the following hacks
Code: [Select]
hint.acpi_perf.0.disabled 1
hint.acpi_throttle.0.disabled 1
hint.p4tcc.0.disabled 1
is to keep the CPU clock fixed at 1.4 GHz (no throttling) regardless of the load, is that correct?
Title: Re: APU2D4 very low throughput 1Gbit
Post by: aesth on May 31, 2021, 01:45:18 pm
They only affect the displaying of the core performance boost. The core performance boost will work the same without them (but it will display 1000 even when it's at 1400).
Title: Re: APU2D4 very low throughput 1Gbit
Post by: hushcoden on May 31, 2021, 04:00:55 pm
Alright then, and as @Ricardo mentioned, is this clock boost only for 1 core ?
Title: Re: APU2D4 very low throughput 1Gbit
Post by: opnfwb on May 31, 2021, 05:12:24 pm
Again want to say, I don't own one of these devices but I think a lot of the configs posted here will not work with later versions of OPNsense (20.7 and 21.1). Both OPNsense 20.7+ and pfSense 2.5+ use FreeBSD 12.x for their base. FreeBSD 12.x uses iflib for NIC queues and no longer contains many of the old tunables what we would have used in FreeBSD 11.x.

Because of this, most of the configs being posted here will not have any impact.

There are still some tunables that you can set on the igb NIC driver, primarily disabling flow control and disabling EEE. These are the "new" tunables needed in the FreeBSD 12.x series:
Code: [Select]
dev.igb.X.fc (X is the interface number)
dev.igb.X.eee_control (X is the interface number)
Setting both of these to 0 should disable the feature.

If you wish to check which options are available for the igb NICs, you can run the following at an SSH console
Code: [Select]
sysctl -a | grep igb
You will notice that if you run this command, there are now many different configurable settings that do not match any of the previously used configs that we relied on in FreeBSD 11.x.

Title: Re: APU2D4 very low throughput 1Gbit
Post by: Ricardo on June 01, 2021, 07:30:21 am
It would be great to highlight these trapmines, as the average APU owner goes to Techlager.se or to some random Calomel article e.g. https://calomel.org/freebsd_network_tuning.html -> and apply the performance optimization sysctl-s that were relevant only for an older v10 or v11 freebsd release, and the current fbsd / hbsd / opnsense release runs v12.x, and benefit near-0% from them.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Burschi on June 02, 2021, 05:39:23 pm
OK, after trying all (well, many...) options and tunables im under the impression that the following is the important part:

[...]
Because of this, most of the configs being posted here will not have any impact.
[...]
You will notice that if you run this command, there are now many different configurable settings that do not match any of the previously used configs that we relied on in FreeBSD 11.x.
^Any hints on this from the experts?
Title: Re: APU2D4 very low throughput 1Gbit
Post by: opnfwb on June 02, 2021, 06:07:15 pm
While I do not consider myself an expert :D I do think Teklager actually left a hint. They specifically say no tuning is needed on pfSense 2.5 (which is FreeBSD 12.x based). What this really means is that anything 12.x based, to include OPNsense as well, will respond in a similar fashion.

Teklager also goes on to show single thread transfer tests with lower performance values when using pfSense 2.5 compared to pfSense 2.4 (and the FreeBSD 11.x tweaks).

Miroco posted the link with Teklager hinting at this.
https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/

"Gigabit config for pfSense 2.5.0. No tweaks are required! Don't follow any of the information listed below for pfSense 2.4.5."

At this point I would try these 4 things and report back. It's also important to make sure that the iperf tests you run are pushing traffic through the firewall (have the client on LAN, and another server on WAN). Don't just host iperf on one of the firewall interfaces.

In your tuneables set the following:
Code: [Select]
hw.ibrs_disable: 1 (just disable this to test throughput, there are security implications)
vm.pmap.pti: 0 (just disable this to test throughput, there are security implications)
dev.igb.0.eee_control: 0 (disable Energy Efficient Ethernet, do this for all IGB interfaces present on the device)
dev.igb.0.fc: 0 (disable Flow Control, do this for all IGB interfaces present on the device)

Set those tuneables and reboot. Then re-run the throughput tests and see if there is an improvement. All traffic shaping and the Netflow Insight plugin on OPNsense should also be disabled during these tests.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Burschi on June 04, 2021, 05:51:01 pm
Thanks for the effort @opnfwb, but it only yielded ~30Mb/sec, but with iperf3 -c <ip> -P 6 i get around 70Mb/sec...
Title: Re: APU2D4 very low throughput 1Gbit
Post by: opnfwb on June 04, 2021, 11:09:15 pm
I think the next thing to try just to rule out some weird inconsistency would be to attempt the same tests on the latest pfsense 2.5 and report back? If you're seeing the same limited throughput on the same platform that Teklager benchmarked then there has to be some other piece of the puzzle missing here. Maybe firmware or some other oddity?
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Burschi on June 05, 2021, 12:43:12 pm
Hey, i tried again with all settings reverted (somehow my box hung after setting the tunables so i had to use the serial...), and it says 20-25 Mb/sec even with -P 6 (and also for lower and higher values).
Im not sure if i can make the pfsense install easily since i have set up VLANs across my homenet; i was thinking about buying a 4-port nic and going VM (enough power to rule that out), since i have proxmox running anyways.
Maybe.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: almodovaris on June 08, 2021, 02:12:10 pm
I don't know, man. I have a connection 600 Mbps down, 40 Mbps up and I can use all the speed with an APU2 (4 GB RAM).

Oh, yes, I don't use Suricata.
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Burschi on June 08, 2021, 08:26:43 pm
i have set up opnsense on my proxmox host with a intel I350 quad nic/passthrough with my original configuration from the APU2D4, and there i get 112 MBit/sec. So seems to be related to cpu? What is wrong with my APU and/or configuration?
Title: Re: APU2D4 very low throughput 1Gbit
Post by: cookiemonster on June 10, 2021, 12:09:56 am
You're not alone looking for drastic improvements Burshi. Apologies to pollute your thread. I'll create my own if you prefer it.
I'm using an APU4D4 I just put together. I've noticed the same problem on OPNSense. I've updated the board with the latest firmware and I have the latest OPNSense 21.1.6
LAN clients via a gigabit switch pull 1 Gbps client to client, to baseline.
Iperf against the firewall only get 390 Mbps.
Iperf through the firewall from public iperf servers hover on 290 Mbps.
Basic default rules and only two manual ones to catch stray DNS traffic. No Suricata,

So far the only tunables I've added are:
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"
legal.intel_igb.license_ack="1"

I'm planning on adding all the ones admodovaris kindly offered and report back. I plan on adding a loader.conf.local & testing before adding them as tunables in the UI. Merging also the ones suggested by opnfwb (thank you).
I will add these:
Code: [Select]
amdtemp_load="YES"
ahci_load="YES"
aesni_load="YES"
if_igb_load="YES"
flowd_enable="YES"
flowd_aggregate_enable="YES"
legal.intel_igb.license_ack="1"
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1
# this is the magic. If you don't set this, queues won't be utilized properly
# allow multiple processes for receive/transmit processing
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"

net.pf.states_hashsize=2097152

hw.igb.num_queues=0

hw.igb.enable_aim=1

hw.igb.enable_msix=1
hw.pci.enable_msix=1
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"

vm.pmap.pti = 0
hw.ibrs_disable = 0

hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1
hint.acpi_perf.0.disabled=1
dev.igb.0.eee_control=0
dev.igb.0.fc=0

hint.p4tcc.1.disabled=1
hint.acpi_throttle.1.disabled=1
hint.acpi_perf.1.disabled=1
dev.igb.1.eee_control=0
dev.igb.1.fc=0

hint.p4tcc.2.disabled=1
hint.acpi_throttle.2.disabled=1
hint.acpi_perf.2.disabled=1
dev.igb.2.eee_control=0
dev.igb.2.fc=0

hint.p4tcc.3.disabled=1
hint.acpi_throttle.3.disabled=1
hint.acpi_perf.3.disabled=1
dev.igb.3.eee_control=0
dev.igb.3.fc=0
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Burschi on June 19, 2021, 03:02:42 pm
Update:
Im running OPNsense in a vm on proxmox with a intel 4port nic passed through. I have max throughput with iperf3 (>100 MB/s), although performance is drastically decreased when using sensei, suricata or ntopng (~60 MB/s).

Im going on with the virtualized OPNsense, keeping the APU2D4 as a backup.

Thank you all for your help!
Title: Re: APU2D4 very low throughput 1Gbit
Post by: Basixs on July 02, 2021, 04:45:36 pm
Hi, was wondering how to see and adjust these tunables:

net.inet.tcp.maxtcptw
net.inet.ip.dummynet.max_chain_len
net.inet.ip.fastforwarding

Can't find them using sysctl -a and adding a entry in loader.conf.local or system tunables on OPNsense via GUI just shows a message at boot up along the lines of: sysctl: unknown oid 'net.inet.ip.fastforwarding'.

I'm on opnsense 21.1, 64bit.

(Sorry for not being relevant to the topic, but a lot of tunables were being discussed so I thought someone here might know. Thanks).
Title: Re: APU2D4 very low throughput 1Gbit
Post by: almodovaris on July 03, 2021, 09:35:54 pm
Yup, a lot of tunables have been dropped in higher versions of FreeBSD.