apu4d4 low throughput

Started by cookiemonster, June 10, 2021, 06:47:52 PM

Previous topic - Next topic
Hello, I'm new to OPSense and I have tried to look on the manual, forum posts and online sources but I will need to ask for some assistance please.
I have this board to setup my router and firewall. I have a 550/75 mbps fiber to the home line.

The problem I have is that I seem to be unable to get more than around 300 mbps throughput.

Setup:
Default firewall rules and two of my own to redirect DNS to an internal client.
No Suricata.
Netflow running.
One OpenVPN server running. No active clients connected.
OPNsense 21.1.6 installed.
No VLANs, only a single flat LAN network.
The LAN interface goes via Cat7 cable to a Gigabit managed switch. No VLANs on the switch.
I have opendns stubby running as recursive resolver.
Unbound running, with opendns stubby as upstream resolver.
My memory usage hovers on 14%. My cpu usage hovers between 3 % and 64 %. It's a bit spikey but seems normally well, not maxed.

Tunables used:
I've collected from various threads what seems valid tunables on these nics i211AT and I have created a loader.conf.local with the following contents:

cat /boot/loader.conf.local
amdtemp_load="YES"
ahci_load="YES"
aesni_load="YES"
if_igb_load="YES"
flowd_enable="YES"
flowd_aggregate_enable="YES"
legal.intel_igb.license_ack="1"
legal.intel_ipw.license_ack="1"
legal.intel_iwi.license_ack="1"
# this is the magic. If you don't set this, queues won't be utilized properly
# allow multiple processes for receive/transmit processing
#hw.igb.rx_process_limit="-1"
h#w.igb.tx_process_limit="-1"

net.pf.states_hashsize="2097152"

hw.igb.num_queues="0"

hw.igb.enable_aim="1"

hw.igb.enable_msix="1"
hw.pci.enable_msix="1"
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"

vm.pmap.pti="0"
hw.ibrs_disable="0"

hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"
hint.acpi_perf.0.disabled="1"
dev.igb.0.eee_control="0"
dev.igb.0.fc="0"

hint.p4tcc.1.disabled="1"
hint.acpi_throttle.1.disabled="1"
hint.acpi_perf.1.disabled="1"
dev.igb.1.eee_control="0"
dev.igb.1.fc="0"

hint.p4tcc.2.disabled="1"
hint.acpi_throttle.2.disabled="1"
hint.acpi_perf.2.disabled="1"
dev.igb.2.eee_control="0"
dev.igb.2.fc="0"

hint.p4tcc.3.disabled="1"
hint.acpi_throttle.3.disabled="1"
hint.acpi_perf.3.disabled="1"
dev.igb.3.eee_control="0"
dev.igb.3.fc="0"


Testing:
To start a baseline, two clients in the LAN iperf from each other and get just over 1 Gbps transfer on two streams (-P 2 option).
When I iperf from any of these clients to public iperf test servers I get only about 350 mbps, for example:
~$ iperf3 -p 5200 -f m -V -c speedtest.wtnet.de -P 2 -t 10 -R
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.03  sec   222 MBytes   186 Mbits/sec  379             sender
[  5]   0.00-10.00  sec   219 MBytes   184 Mbits/sec                  receiver
[  7]   0.00-10.03  sec   136 MBytes   114 Mbits/sec   38             sender
[  7]   0.00-10.00  sec   133 MBytes   112 Mbits/sec                  receiver
[SUM]   0.00-10.03  sec   358 MBytes   299 Mbits/sec  417             sender
[SUM]   0.00-10.00  sec   352 MBytes   296 Mbits/sec                  receiver
snd_tcp_congestion cubic
rcv_tcp_congestion cubic


Naturally I have used a handful of public test servers. Results vary but 350 mbps is the max I've achieved. This is after adding the tunables. Without them there were a little lower.

Questions:
- Is my testing flawed, should I do something different?
- Are other tunable suggestions I should try?
And finally, the big question:
- Is it recomended that I install an older version of OPNSense?
I've seen the thread related to 21.1 moving to a different subsystem that is not yet as performant, on account of of freebsd 12.1-RELEASE having that.
I've also noticed Deciso is sticking with OPNsense 20.7 Release for their devices being sold. Maybe it is the right version for production use.

Thanks for reading and I look forward to some advice.

I've seen now more threads similar to this. I should have done more research before spending money and effort.
So it seems like I've made a mistake chosing APU as low power with *BSD.
Sigh. So my search for low power and good featureset for up to 1 gbps routing will need to continue.

So here's a daft question in my quest to stay wiht OPN. I'd like to try the 20.7 version.
If I save the current 21.6.6 config and apply it to a fresh install of 20.7, could that work?

Welcome to the club of dissappointed Pcengines APU2-3-4-5-6-... owners :)

I got to laugh. Thanks Ricardo for at least comenting.

My service is only 550 Mbps so I don't need full gigabit from the APU. I read so many places that say giga is achievable on it that not getting most of my 550 makes me dispair. Hence looking for tweaks to get to that 550.
By the way there's no PPoE in play.

If pppoe is out of question, you may win the 500Mbit/sec battle (but not the big 1Gbit war, hehe).

What is seriously limiting the performance, is the NAT function done via the "pf". If you can, you may experiment with 2 local PC: connect 1 directly to the WAN leg of the APU, and connect the other PC directly to the LAN leg of the APU, and try to do pure routing with NAT disabled and proper IP subnets assigned to each PC and the APU intrefaces.
Also in my experience, upload speed was much higher via NAT, than the download speed, for an unknown reason.
Good luck, and share any results, somebody may benefit from it.

Tempting to just sell it and recover some of the costs and going back to dd-wrt on a consumer router and get more performance, crazy right.
I digress.

I will try what you suggest. It'll be interesting to see what happens.
Much obliged Ricardo.

Well I think the previous tests to public iperf servers were maybe affected by other traffic, and I need to do what you suggested, verify internally the routing performance before testing to internet points. I just made an ookla speedtest and an iperf one, and the results are more or less about what I was hoping for.

$ speedtest

   Speedtest by Ookla

     Server: YouFibre - Manchester (id = 41410)
        ISP: TalkTalk
    Latency:     4.31 ms   (0.16 ms jitter)
   Download:   478.65 Mbps (data used: 552.4 MB)                               
     Upload:    74.88 Mbps (data used: 65.6 MB)                               
Packet Loss:     0.0%
Result URL: https://www.speedtest.net/result/c/13529b8c-cd2e-4341-9139-fd94253493f9

$ iperf3 -p 5200 -f m -V -c speedtest.wtnet.de -P 5 -t 10 -R
iperf 3.7
Linux mars 5.8.0-55-generic #62~20.04.1-Ubuntu SMP Wed Jun 2 08:55:04 UTC 2021 x86_64
Control connection MSS 1448
Time: Sat, 12 Jun 2021 21:23:20 GMT
Connecting to host speedtest.wtnet.de, port 5200
Reverse mode, remote host speedtest.wtnet.de is sending
      Cookie: zcv4w4f4yc6qmhoty3454hxb7fwapnroz3vb
      TCP MSS: 1448 (default)

[SUM]   0.00-10.03  sec   509 MBytes   426 Mbits/sec  541             sender
[SUM]   0.00-10.00  sec   499 MBytes   418 Mbits/sec                  receiver
snd_tcp_congestion cubic
rcv_tcp_congestion cubic

iperf Done.

My conclusion then is that I need to do internal tests but as you pointed out and I should be able to get most of my 550 mbps connection.
I'm holding back for now from buying another device.

June 13, 2021, 04:20:28 PM #8 Last Edit: June 13, 2021, 04:34:28 PM by dave
Quote from: cookiemonster on June 11, 2021, 10:05:11 PM
Tempting to just sell it and recover some of the costs and going back to dd-wrt on a consumer router and get more performance, crazy right.
I digress.

I will try what you suggest. It'll be interesting to see what happens.
Much obliged Ricardo.

Give OpenWRT a go in that case.   APU's are fully supported.
https://openwrt.org/toh/pcengines/apu2
https://teklager.se/en/knowledge-base/openwrt-installation-instructions/

What I've heard is OpenWRT (being Linux based as appossed to BSD) is more performant due to better multi-threading (PPPOE's not an issue either).  The thing BSD has going for it is it's network stack, it just keeps going and going.  But then I've heard BSD13 has much improved multi-threading...

I've been toying with the idea @dave . Thanks for the suggestion. I'll get another media to install to so I can just switch between them without install/reinstall & configure.

Quote from: dave on June 13, 2021, 04:20:28 PM
Quote from: cookiemonster on June 11, 2021, 10:05:11 PM
What I've heard is OpenWRT (being Linux based as appossed to BSD) is more performant due to better multi-threading (PPPOE's not an issue either).  The thing BSD has going for it is it's network stack, it just keeps going and going.  But then I've heard BSD13 has much improved multi-threading...

Yep, on OpenWRT you get the full Gigabit with zero issue; the hardware is absolutely capable...

You should buy a cheap Intel N100 based system, to be had starting at about 130eu .

OPNSense on an APU just isn't much fun.

I did move to a better (for me) solution. At the end of the covid lockdowns I bought from aliexpress a cw device with a ryzen apu, put proxmox on it, and OPN virtualised. The pcengines APU is an unused backup device.
I avoid Intel CPUs if I can.

Sorry for the intrusion,

What device with Ryzen did you get? I am thinking to switch from Intel based.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

No problem Seimus. I very much dislike the current Intel P and E cores situation with some OSes not yet able to make the best scheduling decisions. I'm sure it'll improve but I've always been on the AMD camp anyway, I'm biased on this.
I went with a ChangWang CW56-58. I went with the option with an AMD Ryzen 5 5600U with Radeon Graphics. On it the OPN VM runs with 2 cores/2 threads, 4 GB of memory. It can run with Suricata in IPS mode on WAN, Crowdsec as LAPI, Zenarmor on LAN. Amazing little box. There is noise from the exhust fan on and off but I put an USB external one and that takes care of it.
Check this article: https://williamlam.com/2023/01/esxi-on-amd-changwang-cw56-58.html
I use Proxmox instead of ESXi though. The article came out after I had made my purchase, so just made me think it wasn't a bad idea (unlinke the pcengines one).