OPNsense Forum

English Forums => Hardware and Performance => Topic started by: Ricardo on July 27, 2018, 12:24:54 pm

Title: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on July 27, 2018, 12:24:54 pm
Dear Opnsense team,

I am facing significant performance issue using Opnsense 18.1.x

Hardware: PC Engines APU2C4, 3x i210AT NIC / AMD GX-412TC CPU / 4 GB DRAM

Issue:
this HW cannot handle 1 Gigabit wire-speed via single-flow network traffic if using Opnsense. Maximum I could get is approx. 450 Mbit (WAN --> LAN direction). There are no custom firewall rules / IDS / IPS / etc. apart from the factory default state after a clean install (I used the serial-installer of 18.1.6rev2, then upgraded all up to 18.1.13 if that counts).

However:
the exact same HW can easily do 850-900+ Mbit/sec single-flow traffic if using a Linux firewall distrib (I used the latest IPFire 2.19 - Core Update 120) and observing much less load during this traffic compared with the load observed in opnsense.

Iperf3 single-flow performance was used to measure throughput, using IP protocol, and NAT. No IMIX stress-test before you ask, on contrary, the biggest possible MTU (1500) and MSS size (1460) was set .

My real concern is about the performance drop, if I enable PPPoE (my ISP connects through PPPoE): as google revealed many "single-thread pppoe speed penalty" topics, that is what started my whole descend into this topic. But as I have bad routing performance using a very ideal setup of purely IP type, I expect PPPoE to be much worse (by definition, it can just be worse after all).

Checking on Freebsd net-mail list about possible solutions/workarounds quickly revealed, that Opnsense is not running Freebsd, but a fork of it (HardenedBSD). So Freebsd support for opnsense is practically non-existent. Or at least everybody keeps pointing fingers to the other kind of situation. I saw several times in this forum, that you refer to Freebsd forums if someone hit a bug that is not considered as Opnsense bug, but rather bug of the underlying OS. Reading that the relationship between the Freebsd team and Hardenesbsd team is far from friendly, I wonder what kind of help one can expect if issue with the OS is found?

The thread started here:
Bug 203856 - [igb] PPPoE RX traffic is limitied to one queue -->
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856
 
then continued here:
https://lists.freebsd.org/pipermail/freebsd-net/2018-July/051197.html

And that is the point, where I am stuck.

In short:
- I tried all the valid settings / tuning seen here:
https://bsdrp.net/documentation/technical_docs/performance#nic_drivers_tuning --> specifics for APU2+igb
- tried "net.isr.maxthreads" and "net.isr.numthreads" greater than 1 and switch net.isr.dispatch to "deferred" --> no measurable improvement in performance, but the load nearly doubled on the APU2

I have collected various performance data during traffic, if that helps to troubleshoot where the bottleneck is in this opnsense system.

-------------------------------------------------------------------------------------------------------------------------------

Opnsense 18.1.13
OS: FreeBSD 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11  116e406d37f(stable/18.1)  amd64

kldstat:
Id Refs Address            Size     Name
 1   91 0xffffffff80200000 213bb20  kernel
 2    1 0xffffffff8233d000 6e18     if_gre.ko
 3    1 0xffffffff82344000 7570     if_tap.ko
 4    3 0xffffffff8234c000 54e78    pf.ko
 5    1 0xffffffff823a1000 e480     carp.ko
 6    1 0xffffffff823b0000 e3e0     if_bridge.ko
 7    2 0xffffffff823bf000 6fd0     bridgestp.ko
 8    1 0xffffffff823c6000 126a8    if_lagg.ko
 9    1 0xffffffff823d9000 1610     ng_UI.ko
10   31 0xffffffff823db000 173e0    netgraph.ko
11    1 0xffffffff823f3000 3620     ng_async.ko
12    1 0xffffffff823f7000 4fb8     ng_bpf.ko
13    1 0xffffffff823fc000 4e98     ng_bridge.ko
14    1 0xffffffff82401000 31e0     ng_cisco.ko
15    1 0xffffffff82405000 f20      ng_echo.ko
16    1 0xffffffff82406000 38b8     ng_eiface.ko
17    1 0xffffffff8240a000 4870     ng_ether.ko
18    1 0xffffffff8240f000 1db0     ng_frame_relay.ko
19    1 0xffffffff82411000 17e8     ng_hole.ko
20    1 0xffffffff82413000 4250     ng_iface.ko
21    1 0xffffffff82418000 6250     ng_ksocket.ko
22    1 0xffffffff8241f000 7d88     ng_l2tp.ko
23    1 0xffffffff82427000 3fe0     ng_lmi.ko
24    1 0xffffffff8242b000 65c8     ng_mppc.ko
25    2 0xffffffff82432000 b48      rc4.ko
26    1 0xffffffff82433000 2ad8     ng_one2many.ko
27    1 0xffffffff82436000 a3e0     ng_ppp.ko
28    1 0xffffffff82441000 8f08     ng_pppoe.ko
29    1 0xffffffff8244a000 5f68     ng_pptpgre.ko
30    1 0xffffffff82450000 2570     ng_rfc1490.ko
31    1 0xffffffff82453000 6288     ng_socket.ko
32    1 0xffffffff8245a000 21a0     ng_tee.ko
33    1 0xffffffff8245d000 2ec0     ng_tty.ko
34    1 0xffffffff82460000 45b8     ng_vjc.ko
35    1 0xffffffff82465000 2f20     ng_vlan.ko
36    1 0xffffffff82468000 31f0     if_enc.ko
37    1 0xffffffff8246c000 28b0     pflog.ko
38    1 0xffffffff8246f000 d578     pfsync.ko
39    1 0xffffffff8247d000 3370     ng_car.ko
40    1 0xffffffff82481000 36a8     ng_deflate.ko
41    1 0xffffffff82485000 4ef8     ng_pipe.ko
42    1 0xffffffff8248a000 3658     ng_pred1.ko
43    1 0xffffffff8248e000 2058     ng_tcpmss.ko
44    1 0xffffffff82621000 7130     aesni.ko
45    1 0xffffffff82629000 1055     amdtemp.ko


The 2 PC I use to generate traffic are 2x Win7 boxes,
PC-A connects directly to igb0 (WAN endpoint), IP addr. 192.168.1.2
PC-B connects directly to igb1 (LAN endpoint), IP addr. 10.0.0.100

I run:

(on the PC-A) iperf3 -s
(on the PC-B) iperf3 -c 192.168.1.2 -t 300  -P 1 -R (-R to simulate traffic direction FROM Wan TO Lan, after PC-B made initial connection TO PC-A)
---------------------------------------------------------------------------------------------------------------------------------------------

loader.conf:

##############################################################
# This file was auto-generated using the rc.loader facility. #
# In order to deploy a custom change to this installation,   #
# please use /boot/loader.conf.local as it is not rewritten. #
##############################################################

loader_brand="opnsense"
loader_logo="hourglass"
loader_menu_title=""

autoboot_delay="3"
hw.usb.no_pf="1"
# see https://forum.opnsense.org/index.php?topic=6366.0
hint.ahci.0.msi="0"
hint.ahci.1.msi="0"

# Vital modules that are not in FreeBSD's GENERIC
# configuration will be loaded on boot, which makes
# races with individual module's settings impossible.
carp_load="YES"
if_bridge_load="YES"
if_enc_load="YES"
if_gif_load="YES"
if_gre_load="YES"
if_lagg_load="YES"
if_tap_load="YES"
if_tun_load="YES"
if_vlan_load="YES"
pf_load="YES"
pflog_load="YES"
pfsync_load="YES"

# The netgraph(4) framework is loaded here
# for backwards compat for when the kernel
# had these compiled in, not as modules. This
# list needs further pruning and probing.
netgraph_load="YES"
ng_UI_load="YES"
ng_async_load="YES"
ng_bpf_load="YES"
ng_bridge_load="YES"
ng_car_load="YES"
ng_cisco_load="YES"
ng_deflate_load="YES"
ng_echo_load="YES"
ng_eiface_load="YES"
ng_ether_load="YES"
ng_frame_relay_load="YES"
ng_hole_load="YES"
ng_iface_load="YES"
ng_ksocket_load="YES"
ng_l2tp_load="YES"
ng_lmi_load="YES"
ng_mppc_load="YES"
ng_one2many_load="YES"
ng_pipe_load="YES"
ng_ppp_load="YES"
ng_pppoe_load="YES"
ng_pptpgre_load="YES"
ng_pred1_load="YES"
ng_rfc1490_load="YES"
ng_socket_load="YES"
ng_tcpmss_load="YES"
ng_tee_load="YES"
ng_tty_load="YES"
ng_vjc_load="YES"
ng_vlan_load="YES"

# dynamically generated tunables settings follow
net.enc.in.ipsec_bpf_mask="2"
net.enc.in.ipsec_filter_mask="2"
net.enc.out.ipsec_bpf_mask="1"
net.enc.out.ipsec_filter_mask="1"
debug.pfftpproxy="0"
vfs.read_max="32"
net.inet.ip.portrange.first="1024"
net.inet.tcp.blackhole="2"
net.inet.udp.blackhole="1"
net.inet.ip.random_id="1"
net.inet.ip.sourceroute="0"
net.inet.ip.accept_sourceroute="0"
net.inet.icmp.drop_redirect="0"
net.inet.icmp.log_redirect="0"
net.inet.tcp.drop_synfin="1"
net.inet.ip.redirect="1"
net.inet6.ip6.redirect="1"
net.inet6.ip6.use_tempaddr="0"
net.inet6.ip6.prefer_tempaddr="0"
net.inet.tcp.syncookies="1"
net.inet.tcp.recvspace="65228"
net.inet.tcp.sendspace="65228"
net.inet.tcp.delayed_ack="0"
net.inet.udp.maxdgram="57344"
net.link.bridge.pfil_onlyip="0"
net.link.bridge.pfil_local_phys="0"
net.link.bridge.pfil_member="1"
net.link.bridge.pfil_bridge="0"
net.link.tap.user_open="1"
kern.randompid="347"
net.inet.ip.intr_queue_maxlen="1000"
hw.syscons.kbd_reboot="0"
net.inet.tcp.log_debug="0"
net.inet.icmp.icmplim="0"
net.inet.tcp.tso="1"
net.inet.udp.checksum="1"
kern.ipc.maxsockbuf="4262144"
vm.pmap.pti="1"
hw.ibrs_disable="0"

# dynamically generated console settings follow
comconsole_speed="115200"
#boot_multicons
boot_serial="YES"
#kern.vty
console="comconsole"

---------------------------------------------
loader.conf.local

# I have commented everything out, (did reboot to apply) to start performance tuning from scratch

#kern.random.harvest.mask=351
#hw.igb.rx_process_limit=-1
#net.link.ifqmaxlen=2048
#net.isr.numthreads=4
#net.isr.maxthreads=4
#net.isr.dispatch=deferred
#net.isr.bindthreads=1
------------------------------------------------

sysctl.conf is practically empty

------------------------------------------------

ifconfig:

Note: igb0 is "WAN", igb1 is "LAN"
Note2: no PPPoE so far!

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5c
        hwaddr 00:0d:b9:4b:0b:5c
        inet6 fe80::20d:b9ff:fe4b:b5c%igb0 prefixlen 64 scopeid 0x1
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5d
        hwaddr 00:0d:b9:4b:0b:5d
        inet6 fe80::20d:b9ff:fe4b:b5d%igb1 prefixlen 64 scopeid 0x2
        inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5e
        hwaddr 00:0d:b9:4b:0b:5e
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
enc0: flags=0<> metric 0 mtu 1536
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: enc
pflog0: flags=100<PROMISC> metric 0 mtu 33160
        groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
        groups: pfsync
        syncpeer: 0.0.0.0 maxupd: 128 defer: off

--------------------------------------------------------------

top -SHPI

last pid: 90572;  load averages:  2.13,  1.48,  1.01    up 0+15:54:28  08:58:36
136 processes: 8 running, 99 sleeping, 29 waiting
CPU 0:  0.0% user,  0.0% nice, 99.1% system,  0.0% interrupt,  0.9% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system, 67.1% interrupt, 32.9% idle
CPU 2:  0.3% user,  0.0% nice,  0.8% system,  0.2% interrupt, 98.7% idle
CPU 3:  0.2% user,  0.0% nice,  1.9% system,  6.8% interrupt, 91.2% idle
Mem: 36M Active, 179M Inact, 610M Wired, 387M Buf, 3102M Free
Swap:

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    0 root       -92    -     0K   448K CPU0    0   1:32  99.37% kernel{igb0 qu
   11 root       155 ki31     0K    64K CPU2    2 904:01  98.85% idle{idle: cpu
   11 root       155 ki31     0K    64K RUN     3 909:09  93.95% idle{idle: cpu
   12 root       -92    -     0K   496K CPU1    1   1:54  50.64% intr{irq262: i
   11 root       155 ki31     0K    64K CPU1    1 906:22  39.25% idle{idle: cpu
   12 root       -92    -     0K   496K WAIT    1   0:26  10.09% intr{irq257: i
   12 root       -92    -     0K   496K WAIT    3   0:03   3.19% intr{irq264: i
   17 root       -16    -     0K    16K -       3   0:08   1.12% rand_harvestq
39298 unbound     20    0 72916K 31596K kqread  3   0:01   1.09% unbound{unboun
   12 root       -92    -     0K   496K WAIT    3   0:02   0.61% intr{irq259: i
   11 root       155 ki31     0K    64K RUN     0 912:29   0.52% idle{idle: cpu
   12 root       -72    -     0K   496K WAIT    2   0:02   0.35% intr{swi1: pfs
    0 root       -92    -     0K   448K -       2   0:00   0.24% kernel{igb1 qu
   12 root       -76    -     0K   496K WAIT    3   0:03   0.15% intr{swi0: uar




-----------------------------

systat -vm 3


    1 users    Load  2.58  1.69  1.11                  Jul 27 08:59
   Mem usage:  21%Phy  1%Kmem
Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  129892   36820 12632092    39224 3175880  count
All  133660   40504 13715660    67628          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt 35953 total
             32       52k    2 5198  32k  926             cow       4 uart0 4
                                                          zfod      1 ehci0 18
25.9%Sys  18.7%Intr  2.1%User  0.0%Nice 53.3%Idle         ozfod       ahci0 19
|    |    |    |    |    |    |    |    |    |           %ozfod  1123 cpu0:timer
=============+++++++++>                                   daefr  1126 cpu1:timer
                                        29 dtbuf          prcfr  1127 cpu3:timer
Namei     Name-cache   Dir-cache    145989 desvn          totfr    84 cpu2:timer
   Calls    hits   %    hits   %     36007 numvn          react     1 igb0:que 0
      19      19 100                 14872 frevn          pdwak 13759 igb0:que 1
                                                       15 pdpgs     1 igb0:que 2
Disks  ada0 pass0                                         intrn     3 igb0:que 3
KB/t   0.00  0.00                                  624712 wire        igb0:link
tps       0     0                                   36984 act       1 igb1:que 0
MB/s   0.00  0.00                                  183780 inact 13514 igb1:que 1
%busy     0     0                                         laund     3 igb1:que 2
                                                  3175880 free   5206 igb1:que 3



-----------------------------

systat -ifstat 3


                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   ||||||||||||||

      Interface           Traffic               Peak                Total
            lo0  in      0.089 KB/s          0.982 KB/s            3.729 MB
                 out     0.089 KB/s          0.982 KB/s            3.729 MB

           igb1  in      1.184 MB/s          1.194 MB/s          603.486 MB
                 out    56.019 MB/s         56.498 MB/s           27.880 GB

           igb0  in     55.994 MB/s         56.525 MB/s           27.880 GB
                 out     1.183 MB/s          1.194 MB/s          603.794 MB



--------------------------------------------

vmstat -i 5

irq4: uart0                           60         12
irq18: ehci0                           4          1
irq19: ahci0                           0          0
cpu0:timer                          4949        989
cpu1:timer                          5623       1124
cpu3:timer                          5623       1124
cpu2:timer                          3845        769
irq256: igb0:que 0                     5          1
irq257: igb0:que 1                 70255      14045
irq258: igb0:que 2                     8          2
irq259: igb0:que 3                    19          4
irq260: igb0:link                      0          0
irq261: igb1:que 0                    10          2
irq262: igb1:que 1                 68832      13761
irq263: igb1:que 2                     5          1
irq264: igb1:que 3                 25967       5191
irq265: igb1:link                      0          0
Total                             185205      37026

---------------------------------------------------------------------------------------

Thanks for your help in advance

Regards,
Richard
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 01, 2018, 01:55:15 pm
Just to further add to the topic:

It seems when 2 parallel iperf streams are running, sometimes I get great results (approx. 800-850 Mbit/sec), and sometimes less than ok results (various values can happen like 300 or 500 or 600 Mbit/sec).

From "top -CHIPS" its clearly visible, that in bad scenarios, only 1 core is getting 100% utilized (100% interrupt), while the other 3 cores are idle at 99%.  During middle-performance case, 2 cores are 100% interrupt, 2 cores are 100% idle. If the best-case scenario happens (800-850 Mbit), 3 cores are nearly all 100% while 1 core is idle 100%. So there must be something happening in the proper load balancing of NIC queues--> CPU cores.

Just by simply re-run the same iperf commandline, I get all these various test results. The values are quite solid during the same session run, but after iperf session completion, re-running the exact same command between the exact same 2 endpoint PC, I get such larg variations. Interrupt-load in top clearly confirms this.

Can anyone reproduce the same test cases, or confirm similar results?
Of course that still does not help the weak single-core performance.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 12:06:23 am
I used to have an APU2C4, and realised from looking around the web that others had the same problem. For example, see this article here (https://teklager.se/en/knowledge-base/apu2c0-ipfire-throughput-test-much-faster-pfsense/). They too seem to blame single-core routing but you have found that at times the cores are more evenly used. I have read that later versions of FreeBSD got better at SMP/multi-core routing but apparently not all the way there yet? Perhaps using several iperf3 sessions you are tying one session to a core, and thus getting better (parallel) throughput that way?

Edit: You may also wish to try these settings/tweaks (https://forum.opnsense.org/index.php?topic=6590.0). I didn't see them before I sold my APU2 and got a G4560 based box instead, but they could help. Report back your findings please.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 02, 2018, 05:32:05 am
I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 08:49:27 am
I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.

What's your hardware? The APU2 is a particular case, as it has a low single core speed (1GHz) and is an embedded low power SoC. For normal x86 hardware you'll be fine - I run 380Mbps down on a small form factor Pentium G4560 and it doesn't break a sweat. Gigabit is fine too.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: franco on August 02, 2018, 02:02:54 pm
I don't think it's practical to compare Linux and FreeBSD throughput and expect them to match. The latter will be lower.


Cheers,
Franco
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 02, 2018, 04:11:00 pm
My hardware is an APU2C4 :).

As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 04:16:22 pm
My hardware is an APU2C4 :).

As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.

Yes of course, but think of it another way. The APU2 is 'only' 1GHz per core. If OPNsense is only using a single core for routing, you've got 1GHz processing power to try to max your connection. Linux on the other hand is multi-core aware. So now you're using 4x 1GHz for routing your connection. No wonder the throughput is higher. Actually, as I said earlier FreeBSD is now getting much better with spreading load across cores, though it doesn't apply for every part of the 'networking' process. FreeBSD has probably the best networking stack in the world, or certainly one of them. It can route 10Gbps, 40Gbps, even 100Gbps on suitable hardware. Unfortunately, the APU2 isn't the most suitable hardware (for high throughput on *BSD).

If you need >500Mbps stick to Linux and you won't have an issue. If you want <500Mbps then *sense will be fine on your APU.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 02, 2018, 04:27:59 pm
Rainmaker,

I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.

But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 04:39:44 pm
Rainmaker,

I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.

But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.

Ah, you (respectfully) are a lot more knowledgeable than I catered for in my response. Apologies, it's difficult to pitch your responses on the Internet; especially when you and the other people don't know each other yet (as I'm sure you know).

Yes, FreeBSD's pf is indeed much more SMP capable. Last week I took both OpenBSD and FreeBSD installs and 'made' routers out of them, before comparing them side-by-side. Even on an 8700k at 5GHz per core OpenBSD was less performant than FreeBSD. However there are many other factors, as we both touched upon in previous posts.

NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:

Code: [Select]
/boot/loader.conf.local
 
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1

Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 03, 2018, 03:07:06 pm

NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:

Code: [Select]
/boot/loader.conf.local
 
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1

Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.

Hi rainmaker,

The license ack has nothing to do with igb driver (imho). This ist related to Intel PRO/Wireless adapters.
(https://www.freebsd.org/cgi/man.cgi?iwi)

regards pylox
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 03, 2018, 03:11:59 pm
Hi pylox,

Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.

My apologies for repeating it here, and thanks for the lesson.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 03, 2018, 03:25:22 pm
Hi pylox,

Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.

My apologies for repeating it here, and thanks for the lesson.

Hi rainmaker,

no problem... Some time before i did the same entry in loader.conf.local... ;D After some research i realize it's bullshit...
I think the OP's problem is something special PPPoE related topic. Normally there should no problems with performance on APU2.

regards pylox
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 06, 2018, 02:18:30 pm
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.

All I could see, that performance of single-flow iperf is constantly maxing at around 450 Mbit/sec (the direction is FROM wan TO lan). FROM lan TO wan seems slightly higher, about approx. 600-650ish Mbit/sec.

Multi-flow iperf: now here comes interesting things. The result varies from run to run, e.g. I run a 2-flow iperf session, that takes 60 seconds. It finishes, I immediately re-start with the same command, and I get a totally different result. Then after 60 seconds, repeat, yet another completely different result, in terms of throughput.

With 2-flow iperf, sometimes I can reach 850-900 Mbit, other times only as low as 250 Mbit. Yep, quite gigantic difference, even though all relevant test parameters unchanged.

When I get 850-900 Mbit throughput, the 2 flows are evenly distributed (450Mbit+450Mbit flow = Total 900 Mbit), and CPU interrupt usage is around 270%-280% (explanation: total CPU processing power is 400% = 100% per CPU core times 4 cores).

When I get 600 Mbit, usually I see 1 flow with 580Mbit, and another flow with 1-2 or max 10Mbit. Interrupt is approx. 170-180%. When I get 200-300Mbit, I get sometimes 2x 150 Mbit, other times 1x190+1x2-3Mbit flows, and only 100% interrupt usage (on 1 single core). And these vary from run to run.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 07, 2018, 07:55:27 pm
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
......

Hi ricsip,

this ist very hard to find. Unfornatunatly i did not have a test setup with a APU2 (and not much time).
But you can try different things:

1. Change this tunables and measure...
vm.pmap.pti="0"  #(disable meltdown patch - this is an AMD processor)
hw.ibrs_disable="1" #(disable spectre patch temporarily)

2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0  #(x = number of interface)

3. Change the network interface interrupt rate and measure
hw.igb.max_interrupt_rate="16000" #(start with 16000, can increased up to 64000)

4. Disable Energy Efficiency for each interface an measure
dev.igb.<x>.eee_disabled="1" #(x = number of interface)

Should be enough for the first time...;-)

regards pylox
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 09, 2018, 02:32:27 pm
Will try to see if any of these make a difference. But in general I am very skeptic that it wont, and as nobody from the forum owners replied anything meaningful since this thread started :(
(apart from basically saying its not practical to compare BSD and Linux)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 09, 2018, 04:08:24 pm
Maybe the forum owners dont use APU?

Have you followed the interrupt stuff from:
https://wiki.freebsd.org/NetworkPerformanceTuning


How many queues does you NIC have? Perhaps you can lower the number of queues on the NIC if single stream is so important for you, but then I'd guess all other traffic will be starving ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 09, 2018, 04:12:45 pm
Will try to see if any of these make a difference. But in general I am very skeptic that it wont, and as nobody from the forum owners replied anything meaningful since this thread started :(
(apart from basically saying its not practical to compare BSD and Linux)

Hi ricsip,

be aware about there a lot of circumstances (especially with hardware, or your test-setup) things will not work in an optimal way... There is no "silver bullet" - so complaining will not help. Also possible other users of OPNSense & APU2 do not have a requirement of one near full 1Gbit flow. From my perspective you have three choices: try to use a stronger hardware, use another software or do some more testing and let participate the community...

regards pylox

 
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 09, 2018, 04:16:00 pm
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 13, 2018, 01:44:44 am
If these suggestions improve performance, I'd love to hear about it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 13, 2018, 10:57:06 am
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)

Testing is in progress, but at the moment I am overloaded with my other tasks. Just wanted to let you know I didnt abandon the thread. As my goal is to get this fixed, I will post the results in the next couple of days here anyway.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 15, 2018, 11:01:04 am
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
......

Hi ricsip,

this ist very hard to find. Unfornatunatly i did not have a test setup with a APU2 (and not much time).
But you can try different things:

1. Change this tunables and measure...
vm.pmap.pti="0"  #(disable meltdown patch - this is an AMD processor)
hw.ibrs_disable="1" #(disable spectre patch temporarily)

2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0  #(x = number of interface)

3. Change the network interface interrupt rate and measure
hw.igb.max_interrupt_rate="16000" #(start with 16000, can increased up to 64000)

4. Disable Energy Efficiency for each interface an measure
dev.igb.<x>.eee_disabled="1" #(x = number of interface)

Should be enough for the first time...;-)

regards pylox

Ok, I did all the steps above. No improvement, still wildly sporadic measurements/results after each test-execution.

Only difference, that the CPU load characteristics went from 99% SYS + 60-70% IRQ --> 100+ 60-70% IRQ (SYS dropped to 1-2%).

Note1: only tried hw.igb.max_interrupt_rate= "8000" --> "16000" not any higher.
Note2: 2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0  #(x = number of interface)  --> TYPO, its actually dev.igb.<x>.fc=0
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 15, 2018, 11:55:19 am
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)

I have also went through this. No measurable improvement in throughput.

machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and  sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.

hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.

hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1"  --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did

I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 15, 2018, 05:16:08 pm
Sorry .. we are all no magicians.  ::)

You can go for commercial vendors like Cisco where you are limited to 85mbit and have to purchase a extra license.

Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 15, 2018, 06:14:49 pm
Well that's disappointing. OPNsense is a great piece of software. Maybe I'll check back in when FreeBSD 12 is released as I think this is overall a better solution for my needs than ipfire.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 15, 2018, 06:21:03 pm
When you send me such a device I can do some testing. No other Idea how to help
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 15, 2018, 06:46:26 pm
I'm willing to chip in to buy the OPNsense project an APU2.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 15, 2018, 08:28:34 pm
Can also be a used one .. I dont need it for long.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 21, 2018, 04:37:21 pm
Looks like I'm the only one willing to chip in?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 04, 2018, 01:18:04 pm
Looks like I'm the only one willing to chip in?

@KantFreeze:
Lets be reasonable. Nobody will send equipment as a compliment to unknown people on the internet. At least that is my view.

@mimugmail: how about a donation towards you, so you can buy a brand new APU2 for yourself, and you could spend some valuable time to see its max. performance capabilities, and document your findings? No need to return the device at the end, you should keep it for future Opnsense release benchmarks / regression tests.

I bought my APU2 from a local reseller (motherboard + black case + external PSU + a 16Gb mSATA SSD), sum was approx. 200 EUR. If there are 10 real volunteers, I am willing to spend 20 EUR (non-refundable) "donation" on this project.

DM me for the details, if you are interested.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 04, 2018, 01:51:02 pm
local = German? I can ask my boss is the company is willing to test such a device ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 04, 2018, 02:33:13 pm
local = German? I can ask my boss is the company is willing to test such a device ..

I am not from Germany, I live in eastern Europe, just converted my local currency to EUR for an approximate estimation. But your local PC shop may sell these devices even cheaper:

http://pcengines.ch/order.htm
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: marjohn56 on September 04, 2018, 03:02:04 pm
local = German? I can ask my boss is the company is willing to test such a device ..


@mimugmail: I have a spare APU2 I no longer use, if you send me your bank account details, pass-codes etc.. that will do as security.


PM me we'll work something out. : :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 04, 2018, 03:09:47 pm
You want to send it to me AND want my bank details?? :P
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 04, 2018, 03:11:54 pm
I ordered this via company, no tax, so only 160EUR

https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: marjohn56 on September 04, 2018, 03:30:56 pm
Cool... OK.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 04, 2018, 04:02:28 pm
I ordered this via company, no tax, so only 160EUR

https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M

I really meant to support this evaluation effort. So if there is still something needed, let us know!
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 12:44:24 pm
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)

I have also went through this. No measurable improvement in throughput.

machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and  sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.

hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.

hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1"  --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did

I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.

Some small addendum:
recently I noticed (maybe when upgraded to 18.7.1_3, but TBH not sure), that sometimes (depends on the actual throughput / interrupt load shared among cores) the serial-console hangs during iperf. As soon as the iperf session is finished or I interrupt the session manually, serial-console becomes live again. Noticed during running "top" on console: I noticed refresh stopped / frozen during the iperf session, keyboard wasnt working either while the iperf traffic happened. As soon the iperf session finished, "top" continued to produce output / console responds to keystrokes.

Seems has to do something with the fact when throughput is alternating between those 2-3 discrete levels randomly among iperf sessions.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 12:49:14 pm
Do you run iperf on the Firewall itself?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 12:59:54 pm
No, never!

The 2 iperf endpoints are running on a PC connected to LAN (igb1) and another PC connected to WAN (igb0), the APU is always just a transit device (packet forwarding / packet filtering  / NAT translation between igb1 and igb0 and vice versa), never terminating any iperf traffic directly on it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 02:40:33 pm
Next week I should get my device and will put it in my lab. Lets see ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 05:56:51 pm
Yet another small addendum:

finally I managed to test throughput over pppoe, under real life conditions.

Results are quite weak:
approx. 250-270 Mbit/sec (WAN-->LAN traffic direction) was achieved with the APU2. Not iperf this time, but tested with some torrent (so nobody can tell that I was pushing for unrealistic expectations over 1 single flow).
Again, the router was only a transit device, the torrent client was running on a PC behind the APU. SSD wasnt the bottleneck during download.

As a comparison, using a different vendor router, I was able to achieve 580-600 Mbit/sec easily downloading the same test torrent. Didnt investigate if it could go higher or not with this different vendor router, but thats still more than double performance difference.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 05:59:15 pm
You mean IPFire on the same hardware?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 06:11:21 pm
You mean IPFire on the same hardware?

No, not ipfire. Sorry if I was unclear :)

I installed a competely different equipment ( Asus AC66U B1 router) just for comparison to see if that router can reach the wirespeed gigabit.

On the APU I could not test ipfire today due to not enough time, but maybe in the coming days I will do another round of tests using the ipfire.

Need to find a timeslot when no users are using the internet :(
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 06:20:39 pm
If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 06:53:12 pm
If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.

Yes, you are right! Some weeks ago, I did run the ipfire distrib on the APU. But that was only in an isolated LAN, without access to pppoe or to the internet. So I could run my iperf benchmarks without breaking the production internet

Today, unfortunately I wasted a lot of time to make the opensense work on my production pppoe internet connection. Basically the Default gateway was not activated properly after the pppoe session came up, so any internet traffic failed with TTL expired error.

My existing config in opnsense I was using static ip for the WAN (remember, I used an isolated LAN earlier for iperf testing). Today I changed the WAN config from static IP to pppoe. But some previous static Lan def gw config was stuck, and wasnt deleted properly (actually dmesg log complained about 2 gateways failed to remove). I logged into console, and tried couple of times  to reset the interface assignment and re-do the ip addressing, then logged into GUI and switched from WAN static IP to pppoe. The CLI console does not allow me to perform advanced config, like pppoe setup, so I had to perform that from GUI.
But it was still broken. I got ppoe session up, and I recieved public IP from my ISP, but the default gw was still the IP of my oldconfig LAN IP.

That is when I decided to login to consoleagain, select Option 4) factory reset, and re-did the Initial setup wizard from scratch on the GUI. I selected WAN type:pppoe, and this way I succeeded. But it wasted half of my day.

https://github.com/opnsense/core/issues/2186
I found this bugreport about pppoe default gateway not updating after the pppoe session activates, but it looked like that bug was fixed in 18.1.9 or so. Seems i was hitting something similar, dont really know.

So basically I did not have time to switch the operating system, boot ipfire, and repeat the same tests under linux OS. Planning to do it in the next coming days.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 08, 2018, 04:56:14 pm
Well, I did test the IPFIRE as well on APU2 (I used latest ipfire-2.21-core123).

I could only achieve the same 250-290 Mbit/sec for the same torrent, as yesterday with the opnsense. Because I was suspicious, I also tried to connect my laptop directly to my ISP (I set up the pppoe profile directly on my PC), and tried it without any middle-router: speed was the same 250-280 Mbit/sec this time. So I think there is a problem with my bloody ISP today (yesterday I managed to get 600 Mbit so there must be something going on today). There is no point continuing this testing until I can figure out what the hell is happening.

If anyone can share with me the simplest PPPOE simulator config, based on Freebsd or Linux, I am going to try that on a powerful PC connecting to my APU, and completely rule out the uncertain ISP from this equation for these tests (I would run the IPERF on the PPPOE-simulator PC itself, being the WAN-endpoint for IPERF). Torrent would be difficult to simulate in such topology, so have to revert to iperf first.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 13, 2018, 01:15:38 pm
Me again.

I did some further testing. No PPPoE involved (dont have access to the internet line at the moment, only performed pure IP<-->IP in my lab, opnsense is only in the transit, not running iperf itself).

Found the option in the menu, where I can literally turn off the firewall (disable all packet filtering), which also disables NAT, and turns opnsense into a plain routing box.

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).

Providing iperf -P 4 gives also very variable results:
- sometimes 1,2 or even 3 sessions are 0Kbit/sec, while the 4th session achieves the maximum throughput that was measure with single flow (-P 1)
- other times 1 flow has double throughput than the other 3 (unbalanced)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 27, 2018, 12:38:16 pm
Next week I should get my device and will put it in my lab. Lets see ..

Hello mimugmail,
did you have a chance to look at the perf of the his box?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 27, 2018, 12:52:23 pm
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 27, 2018, 01:00:07 pm
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/

No problem, take your time and have fun! Hope you can find some clever solution, I am mostly stuck since some time.

Note: be careful what BIOS version you flash! Check these links to be in picture:
https://pcengines.github.io
https://github.com/pcengines/coreboot/issues/196
http://www.pcengines.info/forums/?page=post&id=4C472C95-E846-42BF-BC41-43D1C54DFBEA&fid=6D8DBBA4-9D40-4C87-B471-80CB5D9BD945
http://pcengines.ch/howto.htm#bios

Yes, its kinda mess how unorganized the docs are for this company.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 07, 2018, 01:50:01 pm
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/

Hello, did you manage to check it?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 07, 2018, 02:58:53 pm
My apprentice set it up last week, did some BIOS Updates, will start tomorrow :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 07, 2018, 03:58:25 pm
My apprentice set it up last week, did some BIOS Updates, will start tomorrow :)

Thanks, I'm really curious to see your results!
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 09, 2018, 01:35:15 pm

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).



I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 09, 2018, 08:04:36 pm
Ok, tried all available tuning stuff, single stream download in NAT environment is only 440mbit. I'll a vanilla FreeBSD on thursday ...
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 11, 2018, 12:49:00 pm
I'm not able to install FBSD 11.1 since it always hangs on boot at some acpi stuff. Also happened on OPNsense 18.7 and after around 20 restarts, new install and reverting config it worked again.
11.2 also not possible to install .. don't have the time now.

I have not idea if my device is bricked or sth. but it's way far away from stable .. and only serial is a mess  ::)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 11, 2018, 01:50:37 pm
Quick question: can you tell me
1) what BIOS is running on the board (should be the first thing visible on the serial output if powered on)
2) What storage have you added to the board? Are you trying to boot from SD card or from internal mSATA or something else?

Ps. I managed to run Freebsd 11.2 from a USB drive in Live mode, did not install it to the internal mSATA hard drive.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 12, 2018, 10:43:22 am
I'm on 4.0.19. Live CD is a good idea .. I can try this next week.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 12, 2018, 10:47:03 am
I'm on 4.0.19. Live CD is a good idea .. I can try this next week.

Ok.
By the way, better to use firmware 4.0.18, because 19 has some new boot issue, that has been found recently, and its a big mistery when will pcengines fix it in 4.0.20.

Update: actually they released it already:
https://pcengines.github.io/#lr-12
There seems to be a related fix: "pfSense 2.4.x fails to boot when no USB stick is plugged"
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 17, 2018, 10:47:39 am

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).



I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.

Similar results with vanilla 11.1, now upgrading to 11.2
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: miroco on October 17, 2018, 10:08:31 pm
ECC is fixed on the APU-platform effective 2018-10-04 BIOS v4.8.0.5 Mainline release.

https://pcengines.github.io

https://3mdeb.com/firmware/enabling-ecc-on-pc-engines-platforms/#.W8eUoKeHKuM
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: miroco on October 18, 2018, 10:40:21 am
An unintentional dubble post.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 18, 2018, 11:21:37 am

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).



I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.

Similar results with vanilla 11.1, now upgrading to 11.2

Same with 11.2. I'll now install OPNsense on a similar hardware to see if it's related to the hardware ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 18, 2018, 12:44:11 pm
Thanks for the constant status updates :) Eagerly waiting for your results.

By the way: pls dont forget that there is a current known issue in coreboot 4.8.x regarding CPU downlclocking:
https://github.com/pcengines/coreboot/issues/196

so make sure the poor performance is not because the APU lowers the clockrate after couple of minutes uptime to  @600 Mhz , instead of 1Ghz :)

Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 18, 2018, 01:46:15 pm
But I'm running 4.0.18?


I tested some old Sophos UTM with Atom N540 processor and got in all directions with 1 or 10 streams only 500-600Mbit. I'm searching for a device quite comparable to the APU :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 19, 2018, 07:37:07 pm
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232451
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: ruffy91 on October 20, 2018, 07:19:18 am
i210 has software configurable flow control. Maybe the configuration is not that good?
Registers are:
The  following  registers are defined for  the implementation of  flow control:
 • CTRL.RFCE  field is  used  to enable  reception of legacy flow  control packets and reaction to  them
 • CTRL.TFCE  field is  used  to  enable transmission of  legacy flow  control packets
 • Flow Control Address Low,  High  (FCAL/H) - 6-byte flow  control multicast address
 • Flow Control Type  (FCT) 16-bit field to indicate flow control  type
 • Flow Control bits in Device  Control  (CTRL) register  - Enables flow  control modes
 • Discard PAUSE  Frames (DPF)  and  Pass MAC Control Frames (PMCF) in RCTL  - controls  the forwarding of control  packets to the  host
 • Flow Control Receive Threshold High (FCRTH0)  - A  13-bit high  watermark indicating receive buffer fullness. A  single  watermark  is  used in  link FC  mode.
 • DMA Coalescing  Receive Threshold  High (FCRTC) -  A 13-bit high  watermark indicating receive buffer fullness when in  DMA coalescing and Tx  buffer  is empty.  The  value in  this  register can be higher than  value placed in  the FCRTH0  register  since  the watermark needs to be set  to allow for only receiving a  maximum sized Rx packet before  XOFF flow  control  takes effect and  reception is stopped (refer to  Table 3-28  for  information on  flow  control  threshold calculation).
 • Flow Control Receive Threshold Low (FCRTL0) - A  13-bit low watermark indicating receive buffer emptiness. A single watermark is used  in link FC mode.
• Flow Control Transmit Timer Value  (FCTTV) -  a set  of 16-bit timer values to include in  transmitted PAUSE  frame.  A single  timer is  used  in Link  FC mode
 • Flow Control Refresh  Threshold Value (FCRTV) - 16-bit PAUSE refresh threshold  value
 • RXPBSIZE.Rxpbsize  field is  used  to control  the size of the receive packet  buffer

The datasheet has very detailed descriptions on how flow control works: https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 20, 2018, 11:35:01 am
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232451

Do you think its a flow-control related bug?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 20, 2018, 12:03:34 pm
No idea, it sounds familiar ...
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 22, 2018, 08:41:40 am
Played with FC, tried again mixing setup with TSO, LRO, XCSUM .. always same result.
Found this one:

https://elatov.github.io/2017/04/pfsense-on-netgate-apu4-1gb-testing/

Dont have any other ideas now ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 22, 2018, 11:07:09 am
I tried a test kernel from franco which might come with 19.1 and gained a slightly better rate from 480mbit to 510mbit .. ok, last test for today :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 22, 2018, 01:30:33 pm
I think such small difference cam easily be the random variation between test runs. I could see similar variations myself running on the same OS.

Anyway, thanks for your support, at least I know its not just me. Practically all Pcengines APU2 owners should consider something different for 1Gbit WAN. If opnsense will be installed on the board of course. :-)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 22, 2018, 01:34:43 pm

Anyway, thanks for your support, at least I know its not just me. Practically all Pcengines APU2 owners should consider something different for 1Gbit WAN. If opnsense will be installed on the board of course. :-)

Why? It achieves 1GB with multiple streams easily .. why would someone need 1GB on 1 stream?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 22, 2018, 02:02:23 pm
Do you have any chance to access PPPOE-based WAN / PPPOE-based WAN simulator? As I also have issues to reach 1 Gbit even on multi-stream, if PPPOE is used for the WAN Aconnection. I already gave up hope for 1Gbit single-flow performance, but even multi-flow performance is quite low. Where connecting a PC to the same PPPOE WAN directly (no OPNSENSE router/firewall in front of the PC), I can achieve much higher speeds.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: patcsy88 on December 27, 2018, 04:25:21 am
Hi, I have been following this thread and other related forums re: achieving 1GBit via PPPoE with PCEngines' APU2. net.isr.dispatch = "deferred" yielded only a small speed improvement - from 400Mbps to 450Mbps. Using the ISP-provided DIR-842, I can hit up to 800+Mbps. I am on the latest OPNsense with the stock kernel. PFSense on the same APU2 and net.isr.dispatch = "deferred" yielded 520-550Mbps.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: andbaum on January 21, 2019, 03:04:48 pm
I have an APU2 board with OPNsense as well. My board only achieves about 120 MBit/s per NIC in iPerf  >:(
I posted the problem here: https://forum.opnsense.org/index.php?topic=11228.0
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: monstermania on February 25, 2019, 09:52:57 am
Hi,
I've just found this blog entry: https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/
So the APU2 series should be able to achieve 1 gbit with pfsense.  ::)

best regards
Dirk
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on February 25, 2019, 09:55:15 am
Hi,
I've just found this blog entry: https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/
So the APU2 series should be able to achieve 1 gbit with pfsense.  ::)

best regards
Dirk

IF(!!!) the wan type is NOT pppoe! That fact is not revealed in that blog. Can cause giant speed decrease, thanks to Freebsd pppoe handling defect.
I can only achieve 160-200 mbit, and that is fluctuating heavily between test runs. A cheap Asus RT-AC66U B1 can easily reach 800+ mbit on the very same modem/subscription.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on December 18, 2019, 03:12:52 pm
This topic didn't received much love since the last few months, but I can attest the issue is still present: OPNsense 19.7 on APU2 cannot reach 1Gbps from WAN to LAN with default setup on a single traffic flow.

So I dug around and found a few threads here and there about this, and finally found this topic to which I am replying.  I saw many did some tests, saw the proposed solution at TekLager, etc, but they don't really adress the single flow issue.

I've read about the mono-thread vs multi-thread behavior of the *BSD vs Linux, but single flow traffic will only use 1 thread anyway so I had to discard that too as a probable cause.

I then decided to make my own tests and see if this was related to a single APU2 or all of them.  I've tested 3 x APU2 with different firewall and this is the speed I get with https://speedtest.net (with NATing enable of course):

OPNsense   down: ~500 Mbps    up: ~500 Mbps
pfSense      down: ~700 Mbps    up: ~700 Mbps
OpenWRT   down: ~910 Mbps    up: ~910 Mbps
IPFire         down: ~910 Mbps    up: ~910 Mbps

pfSense on Netgate 3100   down: ~910 Mbps    up:~910 Mbps

My gaming PC (8700k) connected directly into the ISP's modem     down: ~915 Mbps  up:~915 Mbps

I also did some tests by virtualizing all these firewalls (except OpenWRT) on my workstation (AMD 3950X) with VirtualBox (Type 2 Hypervisor - not the best I know didn't had the time to setup something on the ESXi cluster) and you can substract ~200Mbs from all the speeds above.  So that means, even virtualized, IPfire is faster than both OPNsense and pfSense running of the APU2.  I also saw that all of them are using only ONE thread and using almost the same amount of CPU% when the transfer is going on.

My conclusions so far are these:
-The PC Engine APU2 is not the issue - probably a driver issue for OPNsense/pfSense
-Single threaded use for single traffic flow is not the issue either since some firewalls are able to max the speed on 1 thread
-pfSense is still based on FreeBSD which has one the best network stack in the world but it might not use the proper drivers for the NICs on the APU - that's my feeling but can't check this.
-OPNsense is now based on HardenBSD (which is a fork of FreeBSD) and add lots of exploit mitigations directly into the code. Those security enhancements might be the issue with the APU2 slow transfer speed.  OPNsense installed on premise with a ten year old Xeon X5650 (2.66Ghz) can run at 1 Gbps without breaking a sweat.  So maybe a few MHz more are required for OPNsense to max that 1 Gbps pipe.
-OpenWRT and IPFire are Linux based and they benefit from a much broader 'workforce' optimizing everything around them.  NICs are probably detected properly and the proper drivers are being used + the nature of how Linux works could also help in speeding everything a little bit more. And the Linux kernel is a dragster vs FreeBSD kernel (sorry FreeBSD but I still love you since I am wearing your t-shirt today!!).

My next steps would be if I have time, to do direct speed tests internally with iperf3 in order to have another speed chart I can refer too.

Edit: FreeBSD vs HardenedBSD Features Comparison https://hardenedbsd.org/content/easy-feature-comparison

Edit 2: Another thing that came to my mind is the ability of the running OS (in our case OPNsense) to be able to 'turbo' the cores up to 1.4Ghz on the AMD GX-412TC cpu that the APU2 uses.  The base frequency is 1Ghz but with turbo it can reach 1.4Ghz.  I am running the 4.10 latest firmware, but I can't (don't know how) to validate what frequency is being used when doing a transfer.  That would really justify the difference in transfer speed as to why OPNsense can't max a 1 Gbps link while others can.  Link on how to upgrade the bios in the APU2 : https://teklager.se/en/knowledge-base/apu-bios-upgrade/
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on December 18, 2019, 03:20:53 pm
Greatly appreciated your effort. I gave up this topic since a long time, but if you have the energy to go and find the resolution, you have all my support :) !
1 thing I would like to ask you: could you check your results if you emulate PPPoE on the INTERNET interface, instead of plain simple LAN IP protocoll on the WAN interface? As your results will be much much worse under opnsense then what you achieved in this test.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on December 18, 2019, 04:45:45 pm
My APU2 is connected via a CAT6a Ethernet cable to the ISP's modem, which in turn is connected via another CAT6a Ethernet cable to the Fiber Optic transceiver. Then the connection between the ISP's modem is done via PPPoE (which I don't managed - it's done automatically and setup by the ISP).

So the APU2 isn't doing the PPPoE connectivity (as it would have been in this typical scenario 15 years ago via DSL for example) and it is a good thing.  Now if your setup requires the APU2 to perform the PPPoE connectivity, that doesn't really impact the transmission speed.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on December 18, 2019, 05:10:20 pm
"Now if your setup requires the APU2 to perform the PPPoE connectivity, that doesn't really impact the transmission speed."

There is a very high chance, that the pppoe session handling and single threaded MPD daemon is the biggest bottleneck on the apu2 to reach the 1 gigabit speed.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on December 19, 2019, 06:58:58 am
I've setup another test lab (under VirtualBox) to test the iperf3 speed between 2 Ubuntu server each behind an OPNsense 19.7.8 (fresh update from tonight!).  All VMs are using 4 vcpu and 4 GB of RAM.

-First iperf3 test (60 seconds, 1 traffic flow):
The virtual switch performance between SVR1 and SVR2 connected together yields ~2.4Gbps of bandwidth

-Second iperf3 test (60 seconds, 1 traffic flow):
This time, SVR1 is behind FW1 and SVR2 is behind FW2.  Both FW1 and FW2 are connected directly on the same virtual switch. Minimum rules are set to allow connectivity between SVR1 and SVR2 for iperf3.  Both FW1 and FW2 are NATing outbound connectivity. The performance result yields ~380Mbps.

-Third iperf3 test with PPPoE (60 seconds, 1 traffic flow):
FW1 has the PPPoE Server plugin installed and configured.  FW2 is the PPPoE client that will initiate the connection. The performance result yields ~380Mbps.

-Fourth iperf3 test with PPPoE (60 seconds, 2 traffic flow): ~380Mbps

-Fifth iperf3 test with PPPoE (60 seconds, 4 traffic flow): ~390Mbps

So unless I missed something, PPPoE connectivity doesn't affect network speed as I mentionned earlier.

I will try to replicate the same setup but with 2 x APU2 and post back the performance I get.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on December 19, 2019, 03:30:29 pm
Thanks for your effort, this one was a really interesting test series.
The reason why I am suspecting the pppoe encapsulation is a serious limiting factor, that the internet is full of articles that all says the same thing: the pppoe is unsuitable for receve-queue distribution. The result is that only 1 cpu core can effectively process the entire pppoe flow, which means the other cores are sitting idle while 1 core is at 100% load. Because the APU2 has very weak single-core CPU processing power, if the above multi-queue receive is deactivated for pppoe, that is a big warning against using this product for 1gbit networks.
But anyway, I am really curious to see the next test results.

As far as I can recall, I could do 600 Mbit/s only from LAN --> WAN direction (e.g. UPLOAD from lan client to internet server), the WAN--> LAN direction (e.g. download from internet server to lan client) was much slower. And all these results were using pure IP between 2 directly connected test PC. When I installed the firewall into my production system, I reconfigured the WAN interface to pppoe, and real world results were lower than the testbench results.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on December 19, 2019, 04:45:47 pm
Be cautious about what you read on single threaded process being a limiting factor. 

When a single traffic flow enters a device, the ASIC performs the heaving lifting most of the time.  The power required afterward to analyze, route, NAT, etc, that traffic is done most of the time by a cpu core (or 1 thread) somewhere up the process stack.
But that process cannot be well distributed (or parallelized) on many threads (cores) for a single traffic flow - it would be  inefficient in the end since the destination is the same for all the threads and they would have to 'wait' after each other and thus slowing other traffic flow that requires processing.

When multiple traffic flows are entering the same device, of course the other cpu cores will be used to handle the load appropriately.

The only ways to optimize or accelerate single traffic flow on a cpu core are:
-good and optimized network code
-the appropriate network drivers that 'talk' to the NIC
-speedier cpu core (aka higher frequency (GHz/MHz)

A comparison of this behavior is the same kind of (wrong) thinking that people think about link aggregation: if we bundled 4 x 1 Gbps links together, people will think that their new speed for single flow traffic is now 4 Gbps and they are surprised to see that their max speed is still only 1 Gbps because of 'slow' 1-link wire is 1 Gbps.  On multiple traffic flows, then the compound traffic will reach the 4 Gbps speed because now each one of the 1 Gbps links are being used.

I hope that clears up some confusion.

But in the end, there is definitely something not running properly on both OPNsense and pfSense on those APU boards.
The APU's hardware is ok - many and I have showed that.
So what remains are:
a) bad drivers for the Intel 210/211 NICs
b) bad code optimization (the code itself or the inability to make the cpu core reach its 1.4Ghz turbo speed),
c) both a & b

The Netgate SG-3100 that I have has an ARM Cortex-A9 which is a dual-core cpu running at 1.6Ghz and its able to keep that 1 Gbps speed.  And we saw above that pfSense if somewhat faster on the APU compared to OPNsense.  IMO, I really think we are facing a NIC driver issue from FreeBSD for the Intel 210/211 chipset.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on December 20, 2019, 11:40:56 pm
Haven't had the time to setup the APU, but I re-did the same test under ESXi because I was curious of the performance I could reach.

The ESXi 6.7 host is a Ryzen 2700X processor with 32GB and it's storage hooked on a networked FreeNAS.  All four vms were running on it with 2 vcpu and 4 GB RAM each.

The virtual switch bandwidth from svr1 to svr2 direct iperf3 bandwidth was ~24 Gbps.
Then the same flow but svr1 having to pass through fw1 (NAT+Rules), then fw2 (NAT+Rules) then reaching svr2 gave an iperf3 bandwidth of ~4Gbps.

That's a far cry from what I've achieved on faster hardware under VirtualBox lol.

On another subject: I had an issue with this setup under ESXi as the Automatic NAT rules weren't generated for some reason on both firewalls (they were under VirtualBox though). I find that odd, but I recall a few weeks ago while I was giving a class at the college and was using OPNsense for setting up an OpenVPN vpn with my students I was seeing internal network address reaching my firewall WAN port.  The day before, I wasn't seeing this and I didn't change the setup, so I blamed VirtualBox for the problem... but now, I see the same behavior under ESXi and I am wondering if there is an issue with the automatic Outbound NAT rules generation somehow.  What is causing this behavior?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on December 21, 2019, 09:44:45 am
Nat always reduces throughput as it travels the CPU. Auto NAT can cause problems when you have multiple devices in this network to reach. Then you have to remove upstream gateway in Interface config and add manual nat rules. :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on December 21, 2019, 05:06:25 pm
I wouldn't say that NAT always reduces throughput as it depends on what devices are used.
APUs and lot of other cheap and low powered devices do have issues with NAT yes - it was the main reason why I ditched many consumer grade routers when I got fiber 1 Gbps at home 4 years ago.  Back then, only the Linksys 3200ACM was able to keep up the speed with NAT active... until mysteriously - like hundreds of other people that posted on the Linksys forums - connections started to drop randomly and Internet connectivity became a nightmare.

That's when I started looking for something better and I ended up with pfSense on a SG-3100 two years ago.  All my problem were solved and still are up to this day.

Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: franco on December 22, 2019, 09:50:16 am
Can we please quit the others-are-so-great talk now? I don't think mentioning it in every other of your posts really helps this community in any substantial way.


Cheers,
Franco
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on December 22, 2019, 11:54:31 am
You're totally right and fixed it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: franco on December 23, 2019, 02:38:39 pm
Not what I expected when I saw your response and compared it to the edit knowing what you wrote before, but, hey, fair enough that the theme is still the same. All hail the better sense. I guess it's ok to use this opportunity to show a community its shortcomings in particular areas while not being able to throw a bit of money towards capable hardware at least. ;)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: monty11ez on January 16, 2020, 01:32:13 am
I plan on getting an APU2D4 soon since it has superseded the C4. I was wondering if anyone could check to see what
Code: [Select]
sysctl dev.cpu.0.freq_levels outputs and then post what
Code: [Select]
sysctl dev.cpu.0.freq outputs under load?

I know this is an AMD based CPU, but from what I understand the CPU will not turbo unless you are running the powerd deamon. For a console based output you can run powerd with
Code: [Select]
sudo powerd -v assuming sudo is installed on OPNsense.

I hope this helps.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on January 21, 2020, 03:05:17 am
I ran the commands you wrote on the console and here is what I've got from an APU4B4:

root@OPNsense02:~ # sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1000/1008 800/831 600/628


Idle:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000


Under load:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000


So the frequency didn't really change.  Now with powerd running, here is the output where you will see the max frequency still being 1000Mhz:
root@OPNsense02:~ # sudo powerd -v
powerd: unable to determine AC line status
load   4%, current freq 1000 MHz ( 0), wanted freq  968 MHz
load   7%, current freq 1000 MHz ( 0), wanted freq  937 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  907 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  878 MHz
load   7%, current freq 1000 MHz ( 0), wanted freq  850 MHz
load   6%, current freq 1000 MHz ( 0), wanted freq  823 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  797 MHz
changing clock speed from 1000 MHz to 800 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  772 MHz
load   4%, current freq  800 MHz ( 1), wanted freq  747 MHz
load   6%, current freq  800 MHz ( 1), wanted freq  723 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  700 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  678 MHz
load   3%, current freq  800 MHz ( 1), wanted freq  656 MHz
load   5%, current freq  800 MHz ( 1), wanted freq  635 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  615 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  600 MHz
changing clock speed from 800 MHz to 600 MHz
load  10%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   5%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   8%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   5%, current freq  600 MHz ( 2), wanted freq  600 MHz
load  11%, current freq  600 MHz ( 2), wanted freq  600 MHz
load 143%, current freq  600 MHz ( 2), wanted freq 2000 MHz
changing clock speed from 600 MHz to 1000 MHz
load 130%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load  85%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 107%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 101%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 106%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1937 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1876 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq 1817 MHz
load   6%, current freq 1000 MHz ( 0), wanted freq 1760 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1705 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1651 MHz
load   5%, current freq 1000 MHz ( 0), wanted freq 1599 MHz
load   8%, current freq 1000 MHz ( 0), wanted freq 1549 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1500 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1453 MHz
load   5%, current freq 1000 MHz ( 0), wanted freq 1407 MHz
load   8%, current freq 1000 MHz ( 0), wanted freq 1363 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq 1320 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1278 MHz
load   3%, current freq 1000 MHz ( 0), wanted freq 1238 MHz
load   9%, current freq 1000 MHz ( 0), wanted freq 1199 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1161 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1124 MHz
load   3%, current freq 1000 MHz ( 0), wanted freq 1088 MHz
load   8%, current freq 1000 MHz ( 0), wanted freq 1054 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1021 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  989 MHz
load  15%, current freq 1000 MHz ( 0), wanted freq  958 MHz
load   7%, current freq 1000 MHz ( 0), wanted freq  928 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  899 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  870 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq  842 MHz
load   6%, current freq 1000 MHz ( 0), wanted freq  815 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  789 MHz
changing clock speed from 1000 MHz to 800 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  764 MHz
load   6%, current freq  800 MHz ( 1), wanted freq  740 MHz
load   6%, current freq  800 MHz ( 1), wanted freq  716 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  693 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  671 MHz
load   6%, current freq  800 MHz ( 1), wanted freq  650 MHz
load   4%, current freq  800 MHz ( 1), wanted freq  629 MHz
load   5%, current freq  800 MHz ( 1), wanted freq  609 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  600 MHz
changing clock speed from 800 MHz to 600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   8%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   9%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   5%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   9%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load  10%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load  10%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   7%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load  75%, current freq  600 MHz ( 2), wanted freq 1200 MHz
changing clock speed from 600 MHz to 1000 MHz
load 293%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 364%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 382%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 373%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 248%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 250%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 269%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 370%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 345%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 282%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 250%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 276%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 251%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 258%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 267%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 273%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 238%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 270%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 267%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 273%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 264%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 276%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 249%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 241%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 266%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 250%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 247%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 257%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 288%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 263%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 241%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 273%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 257%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 264%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 256%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 263%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 256%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 257%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 248%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 263%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 264%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 272%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 241%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 247%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 260%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 258%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 244%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 251%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load  85%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq 1937 MHz
load 138%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 322%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 322%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 307%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 330%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 331%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 313%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 313%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 325%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 325%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 319%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 322%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 335%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 332%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 342%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 317%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 338%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 326%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 330%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 313%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 337%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 391%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 394%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 394%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 394%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 319%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 101%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 103%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 112%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 108%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 105%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 110%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 172%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 208%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 201%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 210%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 185%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 204%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 185%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 203%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 190%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 136%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 104%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 103%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load  96%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load  13%, current freq 1000 MHz ( 0), wanted freq 1937 MHz
load   8%, current freq 1000 MHz ( 0), wanted freq 1876 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1817 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq 1760 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1705 MHz
load   8%, current freq 1000 MHz ( 0), wanted freq 1651 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1599 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq 1549 MHz
load   3%, current freq 1000 MHz ( 0), wanted freq 1500 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1453 MHz
load   3%, current freq 1000 MHz ( 0), wanted freq 1407 MHz
load   7%, current freq 1000 MHz ( 0), wanted freq 1363 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq 1320 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1278 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1238 MHz
load   7%, current freq 1000 MHz ( 0), wanted freq 1199 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1161 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq 1124 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1088 MHz
load   5%, current freq 1000 MHz ( 0), wanted freq 1054 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq 1021 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq  989 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  958 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq  928 MHz
load   3%, current freq 1000 MHz ( 0), wanted freq  899 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq  870 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  842 MHz
load   4%, current freq 1000 MHz ( 0), wanted freq  815 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  789 MHz
changing clock speed from 1000 MHz to 800 MHz
load   6%, current freq  800 MHz ( 1), wanted freq  764 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  740 MHz
load   5%, current freq  800 MHz ( 1), wanted freq  716 MHz
load   3%, current freq  800 MHz ( 1), wanted freq  693 MHz
load   4%, current freq  800 MHz ( 1), wanted freq  671 MHz
load   0%, current freq  800 MHz ( 1), wanted freq  650 MHz
load   4%, current freq  800 MHz ( 1), wanted freq  629 MHz
load   3%, current freq  800 MHz ( 1), wanted freq  609 MHz
load   9%, current freq  800 MHz ( 1), wanted freq  600 MHz
changing clock speed from 800 MHz to 600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   5%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load  25%, current freq  600 MHz ( 2), wanted freq  600 MHz
^Ctotal joules used: 73.271


I will post back with the APU4D4 and see the difference.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on January 21, 2020, 03:23:08 am
Same results for the APU4D4.

The BIOS of both PCEngines' board are the latest, but the cpu frequency seems capped at 1Ghz which would explain why we can only get around ~650Mbps at best on gigabit links.  That AMD GX-412TC can do 1.2Ghz on boost.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: newsense on January 21, 2020, 04:37:07 am
Can you please post dmidecode -t BIOS on each board for future reference ?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on January 21, 2020, 05:10:09 am
APU4B4 info (bought in June 2018):
root@OPNsense02:~ # dmidecode -t BIOS
# dmidecode 3.2
Scanning /dev/mem for entry point.
SMBIOS 2.8 present.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
        Vendor: coreboot
        Version: v4.11.0.1
        Release Date: 12/09/2019
        ROM Size: 8192 kB
        Characteristics:
                PCI is supported
                PC Card (PCMCIA) is supported
                BIOS is upgradeable
                Selectable boot is supported
                ACPI is supported
                Targeted content distribution is supported
        BIOS Revision: 4.11
        Firmware Revision: 0.0


APU4D4 info (bought in November 2019):
root@OPNsense:~ # dmidecode -t BIOS
# dmidecode 3.2
Scanning /dev/mem for entry point.
SMBIOS 2.8 present.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
        Vendor: coreboot
        Version: v4.11.0.1
        Release Date: 12/09/2019
        ROM Size: 8192 kB
        Characteristics:
                PCI is supported
                PC Card (PCMCIA) is supported
                BIOS is upgradeable
                Selectable boot is supported
                ACPI is supported
                Targeted content distribution is supported
        BIOS Revision: 4.11
        Firmware Revision: 0.0


Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Gary7 on January 21, 2020, 06:08:23 am
I have an APU2D4 for my home firewall and I'm trying to get max performance since the APU2 has limited "horsepower".
I've tried to configure my system to get max benefit from the i210 NICs (thank you calomel.org, among others)
I recently found that for OPNsense, the default value for kern.random.harvest is 2047

root@OPNsense:~ # sysctl kern.random.harvest
kern.random.harvest.mask_symbolic: UMA,FS_ATIME,SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 000000000011111111111
kern.random.harvest.mask: 2047

Based on some recommendations for FreeBSD, set kern.random.harvest.mask = 351 for max throughput.

root@OPNsense:~ # sysctl kern.random.harvest
kern.random.harvest.mask_symbolic: [UMA],[FS_ATIME],SWI,[INTERRUPT],NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 000000000000101011111
kern.random.harvest.mask: 351

The UMA (universal memory allocator) also called zone allocator
According to FreeBSD documentation for RANDOM(4): "obtain entropy from the zone allocator.  This is potentially very high rate, and if so will be of questionable use. If this is the case, use of this option is not recommended."

Default values
FreeBSD:      kern.random.harvest.mask      511
pfSense:      Is it the same as the default for FreeBSD ?
OpnSense:   kern.random.harvest.mask      2047

For max throughput:   kern.random.harvest.mask   351

For an APU2, kern.random.harvest.mask 511 --> 351 gives about 3% better throughput for FreeBSD.
Has anyone documented the throughput difference from kern.random.harvest.mask 2047 --> 511 ?

I know that my firewall will have a little less entropy, but for my purposes, that's OK.
To set kern.random.harvest.mask, I have to use the GUI: System -> Settings -> Tunables and add kern.random.harvest.mask

I have 200 Mbps Internet, I get the same tested speed (236 Mbps) as my consumer-grade router that I know is capable of Gigabit speed.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on January 21, 2020, 03:02:23 pm
If you followed the thread, you know that the APU2 can easily do 500+ Mbps with no tweaking.  The issue we have with it is being able to handle 1 Gbps links when OPNsense is on it vs IPFire or OpenWRT who are able on the same hardware to achieve 900+ Mbps.

When I have time I will try you settings just to see what happens.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: monty11ez on January 21, 2020, 07:24:21 pm
I ran the commands you wrote on the console and here is what I've got from an APU4B4:

root@OPNsense02:~ # sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1000/1008 800/831 600/628


Idle:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000


Under load:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000


So the frequency didn't really change.  Now with powerd running, here is the output where you will see the max frequency still being 1000Mhz:
root@OPNsense02:~ # sudo powerd -v
powerd: unable to determine AC line status
load   4%, current freq 1000 MHz ( 0), wanted freq  968 MHz
load   7%, current freq 1000 MHz ( 0), wanted freq  937 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  907 MHz
load   0%, current freq 1000 MHz ( 0), wanted freq  878 MHz
.
.
.

I will post back with the APU4D4 and see the difference.

Thank you for posting that info. So from what I understand OPNsense is not getting the turbo clock speed info from the BIOS. I'm not exactly sure why that is the case though.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: newsense on January 21, 2020, 08:38:55 pm
The limitation is likely on the hardware side, probably miczyg would be one of the best people to address the why of it if he sees this tread.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on January 21, 2020, 11:28:58 pm
The same hardware with Linux based OS (IPFire and OpenWRT) are able to max that 1 Gbps NICs without problems (see post on previous page).
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: monty11ez on February 03, 2020, 10:36:44 pm
Based on this document from pcengines the frequency is not reported correctly in sysctl. So you have to set these commands in loader.conf to get proper readings.
Code: [Select]
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1
hint.acpi_perf.0.disabled=1

I don't quite understand if this affects the actual speed of the device though. If someone could confirm that would be amazing.

https://github.com/pcengines/apu2-documentation/blob/master/docs/apu_CPU_boost.md
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: loredo on February 04, 2020, 09:42:13 am
Unfortunately this is only a displaying matter.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: arneboeses on February 13, 2020, 10:59:53 am
I just installed on my APU4D4 pfsense 2.5.0 latest beta version and there I get 660Mbps instead of 340Mbps with the latest opnsense 20.1. So the underlying BSD version seems to handle the drivers in a better way, but still to few as I have an 1Gibt/s ISP link.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: franco on February 13, 2020, 11:57:30 am
I don't want to be unfriendly, but I'm definitely going to close this thread if people keep comparing apples and oranges.


Cheers,
Franco
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: cgone on February 13, 2020, 03:25:29 pm
I experiment with my apu2 and opnsense-firewall and my impression is, that the most important configuration is

Quote
net.isr.dispatch=deferred

I am able to saturate a 250mbit downlink from the german Telekom with one stream.
Of couse ids/netopng is not possible, if i want to saturate the connection.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on February 13, 2020, 04:13:42 pm
I don't want to be unfriendly, but I'm definitely going to close this thread if people keep comparing apples and oranges.


Cheers,
Franco

Hello Franco,

I disagree as this isn't apples to oranges comparison, but as this thread is going on (started in July 2018 and still no resolution), comparing other firewalls with OPNsense running on the SAME hardware and saying what we are trying to solve the issue is the only thing we can do "on our side".  And up to now, not a single dev produced some help in this thread as to why we might be having the issue or some path of resolution/explanation.

The PCEngine hardware is used by a lots of people around the world (privately and commercialy) since many years (before OPNsense was forked) and it provides a lot and fills a segment on the market that other commercial brands can't even achieve for the same price (reliability and low power usage).  So we want to maximize our investment AND also use OPNsense because we like/prefer it over other firewalls.  Trying to muzzle or threathened us by closing the thread isn't the right direction imo and isn't what I am expecting from the OPNsense forum - and is a reason many of us left "that other well known firewall" for OPNsense.  We are not bitching but we are kind of fed up (in a way) by the lack of help or feedback by the guys who are making OPNsense.

So to be back on the thread itself, since other firewalls (Linux-based firewalls) are able to max the gigabit speed on any of the NIC of the APU2 from PCengine, we are all puzzled as to why OPNsense isn't capable of doing it.  FreeBSD has the best TCP/IP stack of the *NIX out there so what is the problem

We are not all Operating System developpers and thus are not equipped to check what's going on when a transfer is occuring on the APU2's NICs.  Is there an issue with FreeBSD/HardenedBSD and the Intel's NIC of the APU2?  Is there some other issue with FreeBSD/HardenedBSD not being able to turbo the AMD cpu at 1.4Ghz? Anything else?

We post on these forums to get (we hope) some answers from the devs themselves on some of the issues we encounters - like this one.  So please, dont turn into that other company but instead maybe forward the questions to the dev team so they can take a look.

Thank you for your comprehension.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: franco on February 13, 2020, 04:17:14 pm
For the community the "X is faster than Y, I just checked" is a waste of time if you don't say how "Y" goes from slower to faster. Even if you post OPNsense is faster than Z, I'm going to close this topic because just like in real life:

You measure your progress from where you were to where you are, you must not compare yourself to others because it is pointless and shallow.


Cheers,
Franco
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on February 13, 2020, 04:26:17 pm

Hello Franco,

I disagree as this isn't apples to oranges comparison, but as this thread is going on (started in July 2018 and still no resolution), comparing other firewalls with OPNsense running on the SAME hardware and saying what we are trying to solve the issue is the only thing we can do "on our side".  And up to now, not a single dev produced some help in this thread as to why we might be having the issue or some path of resolution/explanation.

The PCEngine hardware is used by a lots of people around the world (privately and commercialy) since many years (before OPNsense was forked) and it provides a lot and fills a segment on the market that other commercial brands can't even achieve for the same price (reliability and low power usage).  So we want to maximize our investment AND also use OPNsense because we like/prefer it over other firewalls.  Trying to muzzle or threathened us by closing the thread isn't the right direction imo and isn't what I am expecting from the OPNsense forum - and is a reason many of us left "that other well known firewall" for OPNsense.  We are not bitching but we are kind of fed up (in a way) by the lack of help or feedback by the guys who are making OPNsense.

So to be back on the thread itself, since other firewalls (Linux-based firewalls) are able to max the gigabit speed on any of the NIC of the APU2 from PCengine, we are all puzzled as to why OPNsense isn't capable of doing it.  FreeBSD has the best TCP/IP stack of the *NIX out there so what is the problem

We are not all Operating System developpers and thus are not equipped to check what's going on when a transfer is occuring on the APU2's NICs.  Is there an issue with FreeBSD/HardenedBSD and the Intel's NIC of the APU2?  Is there some other issue with FreeBSD/HardenedBSD not being able to turbo the AMD cpu at 1.4Ghz? Anything else?

We post on these forums to get (we hope) some answers from the devs themselves on some of the issues we encounters - like this one.  So please, dont turn into that other company but instead maybe forward the questions to the dev team so they can take a look.

Thank you for your comprehension.


The reason why probably no dev answerd is that maybe none of the devs have either an APU or such a high bandwidth. Keep in mind that this is a community project. I for myself have only VDSL100 .. I have no idea how to help because I can't reproduce.

Maybe you can start with installing fresh pfsense, do a sysctl -a, output to file, do same for opnsense, and the diff them. Maybe pf has some other defaults.

Keep in mind that pfsense has about 100x bigger community, so the chance that one guy with an APU and enought knowledge to solve this and report the fix (not the problem) to upstream is 100x higher.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on February 13, 2020, 05:35:43 pm
The reason why probably no dev answerd is that maybe none of the devs have either an APU or such a high bandwidth. Keep in mind that this is a community project. I for myself have only VDSL100 .. I have no idea how to help because I can't reproduce.

Maybe you can start with installing fresh pfsense, do a sysctl -a, output to file, do same for opnsense, and the diff them. Maybe pf has some other defaults.

Keep in mind that pfsense has about 100x bigger community, so the chance that one guy with an APU and enought knowledge to solve this and report the fix (not the problem) to upstream is 100x higher.

Since you didn't read the whole thread, I will make it short for you:
-pfSense has the same problem on the same APU and no one in that community has found a fix - anything that is posted elsewhere has been tested and doesn't provide any REAL single-thread / single-stream solution.
-You don't need 1+ Gbps ISP bandwidth to recreate the problem: a local network with CAT5E ethernet cables will do the job between 2 physical PCs.
-If the devs don't have access to a PCEngine APU, I can send them one for free if they care to fix the problem.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on February 13, 2020, 05:54:31 pm
https://www.max-it.de/kontakt/
Michael Muenz
Address above ...
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pjdouillard on February 13, 2020, 06:16:41 pm
https://www.max-it.de/kontakt/
Michael Muenz
Address above ...

Will pm you.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: johnsmi on February 14, 2020, 05:21:24 am
First of all: I have absolutely no clue. Please Ignore this if I'm completely wrong.

Is it perhaps HardenedBSD related?
It might be tuning away from performance by using different defaults than other OS?


e.g.
https://bsdrp.net/documentation/technical_docs/performance#entropy_harvest_impact
Suggestts reducing kern.random.harvest.mask from 511 to 351 for performance gain.

OPNsense default seems to be 2047.

Now i take a look and see:

# sysctl kern.random
kern.random.harvest.mask: 67583

2^16+2047=67583
Some different Byte is set.
Tho i never tested 66047 nor 65887 nor 351.

And this thread almost a year ago:
https://forum.opnsense.org/index.php?topic=12058.0

more recently
https://forum.opnsense.org/index.php?topic=15686.msg71923#msg71923


Perhaps someone who understands this stuff can give advice how to tune?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on June 19, 2020, 03:46:15 pm
First of all: I have absolutely no clue. Please Ignore this if I'm completely wrong.

Is it perhaps HardenedBSD related?
It might be tuning away from performance by using different defaults than other OS?


e.g.
https://bsdrp.net/documentation/technical_docs/performance#entropy_harvest_impact
Suggestts reducing kern.random.harvest.mask from 511 to 351 for performance gain.

OPNsense default seems to be 2047.

Now i take a look and see:

# sysctl kern.random
kern.random.harvest.mask: 67583

2^16+2047=67583
Some different Byte is set.
Tho i never tested 66047 nor 65887 nor 351.

And this thread almost a year ago:
https://forum.opnsense.org/index.php?topic=12058.0

more recently
https://forum.opnsense.org/index.php?topic=15686.msg71923#msg71923


Perhaps someone who understands this stuff can give advice how to tune?


I am confident, nobody has the 100% reliably working solution for this problem.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: iam on August 03, 2020, 05:16:20 pm
Hi, has someone tested this with 20.7? Before the upgrade the results of various speed tests has shown nearly 270 MBit/s. After the upgrade it's 200 MBit/s only.

I have a 300 MBit/s FTTH PPPoE connection.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: banym on August 06, 2020, 08:34:02 am
Are you using IPS/IDS?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: iam on August 06, 2020, 09:57:31 am
No I've disabled it. But I use VLANs and PPPoE.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: telefonmann on September 23, 2020, 05:07:30 pm
Recommended reading:
APU2 performance is insufficient for Gigabit if WAN is PPPoE
http://www.pcengines.info/forums/?page=post&id=E801CA38-8CD5-4854-95A7-99B67B5DB281&fid=DF5ACB70-99C4-4C61-AFA6-4C0E0DB05B2A&pageindex=1
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: iam on September 23, 2020, 11:39:55 pm
Thanks. Has someone experiences with this?
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856#c11
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 24, 2020, 11:03:54 am
Thanks. Has someone experiences with this?
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856#c11

In the last post in that topic, the freebsd maintaner Eugene Grosbein closed the bug with "Closed Works As Intended", so they dont acknowledge this is a bug, and seems will never be fixed. You have to forget Pcengines APU2 for PPPoE WAN and 1Gbit. Unless PCengines release a new PCB design with a much stronger (2Ghz+) embedded APU.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: telefonmann on September 24, 2020, 11:08:35 am
Yes, I've set "net.isr.maxthreads" and "net.isr.numthreads" to the number of cores (4) and net.isr.dispatch to "deferred". This led to a a slight performance increase (~10%). I will now try to offload PPPoE stuff from the firewall to the modem (my modem has this option) and see what happens.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: telefonmann on September 25, 2020, 02:55:53 pm
Just for testing today I deactivated my PPPoE interface in OPNsense and - guess what - the performance on the other two (non-PPPoE) interfaces DOUBLED instantly after a reboot (sorry to say that after reactivating PPPoE the "boost" went away - even without rebooting).
So it looks like you can achieve 1GBit speed only if you don't use PPPoE - regardless of the FW OS used (forget about the others, they will have the same problem).
I bought a DrayTek 165 (VDSL2+ 35b) modem now which is capable of handling the whole PPPoE stuff on its own. This way the OPNsense will only get IP traffic and it should finally work.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: iam on September 26, 2020, 12:22:38 am
Yes, I've set "net.isr.maxthreads" and "net.isr.numthreads" to the number of cores (4) and net.isr.dispatch to "deferred". This led to a a slight performance increase (~10%). I will now try to offload PPPoE stuff from the firewall to the modem (my modem has this option) and see what happens.

I can recommend these settings. Before the upgrade to 20.7 I had measured always values between 250 and 280 MBit/s. After the update I measured values between 180 and 220 MBit/s only. Now I measure values up to 313 MBit/s. Our contract says 300MBit/s, so that's really nice :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: dave on September 26, 2020, 09:54:19 pm
I recommend these setting to
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rcmcronny on September 28, 2020, 02:48:36 pm
Hi,

Just for testing today I deactivated my PPPoE interface in OPNsense and - guess what - the performance on the other two (non-PPPoE)
[...]
I bought a DrayTek 165 (VDSL2+ 35b) modem now which is capable of handling the whole PPPoE stuff on its own. This way the OPNsense will only get IP traffic and it should finally work.

I have a vigor 130, this should also work with that, could you please give a link on what to change in the config. Actually i do vlan tagging on the vigor only. Do the pppoe stuff at the modem with vlan tagging is perhaps a better method, would like to dive in, if it is a better setup for me.

Thanks,
Ronny
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: dave on September 30, 2020, 06:44:21 pm
I wanted to try the same thing but it looks like the EU and US version of the Vigor 130 are actually different.
The US version can act as a router (enabling it to handle PPPoE auth and encapsulation itself), whereas the EU version doesn't have this functionality.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: telefonmann on October 01, 2020, 10:29:50 am
@rcmcronny, http://www.draytektr.com/documents/product/619F295B-B506-2CF1-D9115EA3B629181F.pdf
First set operation mode to "router mode", after this the device will reboot. Then just use the wizard (pages 7 ff. in the manual)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: FrenchFries on November 18, 2020, 10:46:48 am
I am using an older APU1C model with only two cores.
I understand only one core is being used for routing.

Just a remark, testing network bandwidth with Ikoula Speedtest is not the right methodology, as it is very unaccurate.
Using Ikoula speedtest, my result was 440 Mbits/s downstream.

But the accurate speed measured with iperf3 is 571 Mbits/s
Code: [Select]
iperf3 -p 9222 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 9222
[  5] local 10.90.20.1 port 60560 connected to 89.84.1.222 port 9222
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  69.1 MBytes   579 Mbits/sec  268    462 KBytes       
[  5]   1.00-2.00   sec  68.8 MBytes   577 Mbits/sec    0    567 KBytes       
[  5]   2.00-3.00   sec  67.5 MBytes   566 Mbits/sec    2    461 KBytes       
[  5]   3.00-4.00   sec  68.8 MBytes   577 Mbits/sec    0    563 KBytes       
[  5]   4.00-5.00   sec  67.5 MBytes   566 Mbits/sec    0    648 KBytes       
[  5]   5.00-6.00   sec  67.5 MBytes   566 Mbits/sec    2    544 KBytes       
[  5]   6.00-7.00   sec  70.0 MBytes   587 Mbits/sec    0    632 KBytes       
[  5]   7.00-8.00   sec  67.5 MBytes   566 Mbits/sec    2    533 KBytes       
[  5]   8.00-9.00   sec  67.5 MBytes   566 Mbits/sec    0    621 KBytes       
[  5]   9.00-10.00  sec  68.8 MBytes   577 Mbits/sec    2    510 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   683 MBytes   573 Mbits/sec  276             sender
[  5]   0.00-10.00  sec   680 MBytes   571 Mbits/sec                  receiver

So I can confirm that the APU1 with older core and older NIC can achieve 571 Mbits downstream.
This is OPNsense latest version 20.7.
I am connecting from a GNU/Linux laptop using an RJ-45 wire and IPv4.
OPNsense is connected to the fiber router with and RJ-45 wire and IPv4 with NAT.

iperf3 also has an option to using multiple connection streams, which is -P 2 for two cores :
Code: [Select]
iperf3 -P 2 -p 9222 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 9222
[  5] local 10.90.20.1 port 38612 connected to 89.84.1.222 port 9222
[  7] local 10.90.20.1 port 38614 connected to 89.84.1.222 port 9222
^[[A[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  25.4 MBytes   213 Mbits/sec   12    229 KBytes       
[  7]   0.00-1.00   sec  46.0 MBytes   386 Mbits/sec   64    318 KBytes       
[SUM]   0.00-1.00   sec  71.3 MBytes   598 Mbits/sec   76             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  31.0 MBytes   260 Mbits/sec    0    314 KBytes       
[  7]   1.00-2.00   sec  37.2 MBytes   312 Mbits/sec    2    279 KBytes       
[SUM]   1.00-2.00   sec  68.2 MBytes   572 Mbits/sec    2             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  37.0 MBytes   311 Mbits/sec    1    290 KBytes       
[  7]   2.00-3.00   sec  32.3 MBytes   271 Mbits/sec    1    263 KBytes       
[SUM]   2.00-3.00   sec  69.3 MBytes   582 Mbits/sec    2             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  34.8 MBytes   292 Mbits/sec    1    263 KBytes       
[  7]   3.00-4.00   sec  31.9 MBytes   268 Mbits/sec    1    245 KBytes       
[SUM]   3.00-4.00   sec  66.7 MBytes   560 Mbits/sec    2             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  34.2 MBytes   287 Mbits/sec    0    348 KBytes       
[  7]   4.00-5.00   sec  33.4 MBytes   280 Mbits/sec    1    239 KBytes       
[SUM]   4.00-5.00   sec  67.6 MBytes   567 Mbits/sec    1             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  39.6 MBytes   333 Mbits/sec    1    307 KBytes       
[  7]   5.00-6.00   sec  28.5 MBytes   239 Mbits/sec    2    226 KBytes       
[SUM]   5.00-6.00   sec  68.1 MBytes   571 Mbits/sec    3             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  39.3 MBytes   330 Mbits/sec    0    389 KBytes       
[  7]   6.00-7.00   sec  30.0 MBytes   251 Mbits/sec    0    311 KBytes       
[SUM]   6.00-7.00   sec  69.3 MBytes   581 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  36.5 MBytes   306 Mbits/sec    1    355 KBytes       
[  7]   7.00-8.00   sec  30.9 MBytes   259 Mbits/sec    1    305 KBytes       
[SUM]   7.00-8.00   sec  67.4 MBytes   565 Mbits/sec    2             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  36.7 MBytes   308 Mbits/sec    1    329 KBytes       
[  7]   8.00-9.00   sec  32.0 MBytes   268 Mbits/sec    1    293 KBytes       
[SUM]   8.00-9.00   sec  68.7 MBytes   577 Mbits/sec    2             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  35.2 MBytes   295 Mbits/sec    1    305 KBytes       
[  7]   9.00-10.00  sec  31.9 MBytes   268 Mbits/sec    1    279 KBytes       
[SUM]   9.00-10.00  sec  67.1 MBytes   563 Mbits/sec    2             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   350 MBytes   293 Mbits/sec   18             sender
[  5]   0.00-10.01  sec   348 MBytes   292 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   334 MBytes   280 Mbits/sec   74             sender
[  7]   0.00-10.01  sec   331 MBytes   278 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec   684 MBytes   574 Mbits/sec   92             sender
[SUM]   0.00-10.01  sec   679 MBytes   569 Mbits/sec                  receiver
But this gave me the same results (probably a limitation of my hardware NICs ?).

IMHO, you should make a test using iperf3 for accurate results.
iperf3 should be running on client, not directly on OPNsense of course.

Edit : iperf3 should be used with -R option to ask the server to send information, otherwize you are testing upload speed. My upload speed is around 600Mbits/s, so I need to retest downloading with -R option.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: chemlud on November 18, 2020, 11:01:41 am
I wanted to try the same thing but it looks like the EU and US version of the Vigor 130 are actually different.
The US version can act as a router (enabling it to handle PPPoE auth and encapsulation itself), whereas the EU version doesn't have this functionality.

Nope, the "EU-version" of the Vigor 130 (if there is such a thing) can act as router or modem (bridged mode), but as it is most often used as modem, it comes pre-configured in modem mode. Let the sense do the PPPoE and VLAN is the preferred configuration.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: FrenchFries on November 18, 2020, 11:15:52 am
Here is a more accurate downloading speedtest with iperf3 and one thread.
I used -P option to test downloading speed, not uploading :

Code: [Select]
iperf3 -R -P 1 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5201
Reverse mode, remote host bouygues.iperf.fr is sending
[  5] local 10.90.20.1 port 39286 connected to 89.84.1.222 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  53.7 MBytes   450 Mbits/sec                 
[  5]   1.00-2.00   sec  52.8 MBytes   443 Mbits/sec                 
[  5]   2.00-3.00   sec  55.3 MBytes   464 Mbits/sec                 
[  5]   3.00-4.00   sec  61.4 MBytes   515 Mbits/sec                 
[  5]   4.00-5.00   sec  54.3 MBytes   456 Mbits/sec                 
[  5]   5.00-6.00   sec  53.0 MBytes   445 Mbits/sec                 
[  5]   6.00-7.00   sec  53.8 MBytes   451 Mbits/sec                 
[  5]   7.00-8.00   sec  48.3 MBytes   405 Mbits/sec                 
[  5]   8.00-9.00   sec  54.6 MBytes   458 Mbits/sec                 
[  5]   9.00-10.00  sec  54.4 MBytes   457 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   550 MBytes   461 Mbits/sec  31826             sender
[  5]   0.00-10.00  sec   542 MBytes   455 Mbits/sec                  receiver

However, with two threads, I have the same results :
Quote
iperf3 -R -P 2 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5201
Reverse mode, remote host bouygues.iperf.fr is sending
[  5] local 10.90.20.1 port 40064 connected to 89.84.1.222 port 5201
[  7] local 10.90.20.1 port 40066 connected to 89.84.1.222 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  28.8 MBytes   241 Mbits/sec                 
[  7]   0.00-1.00   sec  25.7 MBytes   216 Mbits/sec                 
[SUM]   0.00-1.00   sec  54.5 MBytes   457 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  27.5 MBytes   231 Mbits/sec                 
[  7]   1.00-2.00   sec  27.6 MBytes   232 Mbits/sec                 
[SUM]   1.00-2.00   sec  55.1 MBytes   462 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  23.6 MBytes   198 Mbits/sec                 
[  7]   2.00-3.00   sec  29.4 MBytes   246 Mbits/sec                 
[SUM]   2.00-3.00   sec  53.0 MBytes   444 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  23.8 MBytes   200 Mbits/sec                 
[  7]   3.00-4.00   sec  26.9 MBytes   226 Mbits/sec                 
[SUM]   3.00-4.00   sec  50.7 MBytes   426 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  27.3 MBytes   229 Mbits/sec                 
[  7]   4.00-5.00   sec  23.7 MBytes   199 Mbits/sec                 
[SUM]   4.00-5.00   sec  51.0 MBytes   428 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  19.8 MBytes   166 Mbits/sec                 
[  7]   5.00-6.00   sec  30.2 MBytes   253 Mbits/sec                 
[SUM]   5.00-6.00   sec  50.0 MBytes   419 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  16.9 MBytes   142 Mbits/sec                 
[  7]   6.00-7.00   sec  34.6 MBytes   290 Mbits/sec                 
[SUM]   6.00-7.00   sec  51.5 MBytes   432 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  16.9 MBytes   142 Mbits/sec                 
[  7]   7.00-8.00   sec  34.0 MBytes   285 Mbits/sec                 
[SUM]   7.00-8.00   sec  50.8 MBytes   426 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  15.8 MBytes   133 Mbits/sec                 
[  7]   8.00-9.00   sec  38.1 MBytes   320 Mbits/sec                 
[SUM]   8.00-9.00   sec  53.9 MBytes   452 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  14.2 MBytes   119 Mbits/sec                 
[  7]   9.00-10.00  sec  40.1 MBytes   336 Mbits/sec                 
[SUM]   9.00-10.00  sec  54.3 MBytes   455 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   219 MBytes   183 Mbits/sec  21640             sender
[  5]   0.00-10.00  sec   215 MBytes   180 Mbits/sec                  receiver
[  7]   0.00-10.01  sec   315 MBytes   264 Mbits/sec  29816             sender
[  7]   0.00-10.00  sec   310 MBytes   260 Mbits/sec                  receiver
[SUM]   0.00-10.01  sec   534 MBytes   447 Mbits/sec  51456             sender
[SUM]   0.00-10.00  sec   525 MBytes   440 Mbits/sec                  receiver

I also testing on local server (connected on different VLAN with different subnets, so OPNsense is acting as NAT):
Quote
iperf3 -R -c 10.90.70.250
Connecting to host 10.90.70.250, port 5201
Reverse mode, remote host 10.90.70.250 is sending
[  5] local 10.90.20.1 port 54348 connected to 10.90.70.250 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  85.7 MBytes   719 Mbits/sec                 
[  5]   1.00-2.00   sec  86.8 MBytes   728 Mbits/sec                 
[  5]   2.00-3.00   sec  86.3 MBytes   724 Mbits/sec                 
[  5]   3.00-4.00   sec  85.8 MBytes   720 Mbits/sec                 
[  5]   4.00-5.00   sec  84.9 MBytes   712 Mbits/sec                 
[  5]   5.00-6.00   sec  81.9 MBytes   687 Mbits/sec                 
[  5]   6.00-7.00   sec  88.6 MBytes   744 Mbits/sec                 
[  5]   7.00-8.00   sec  87.6 MBytes   735 Mbits/sec                 
[  5]   8.00-9.00   sec  85.7 MBytes   719 Mbits/sec                 
[  5]   9.00-10.00  sec  87.3 MBytes   732 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   864 MBytes   724 Mbits/sec   19             sender
[  5]   0.00-10.00  sec   861 MBytes   722 Mbits/sec                  receiver

Here I can achieve 722 Mbits, which is pretty good for an older APU1c platform.
Same results with 2 threads.

Two remarks:

1) I cannot explain why iperf3 is so much faster on a local iperf3 server with NAT.

2) OPNsense does not seem to support multiple core routing, as speed is not  higher with two treads.
I even tested with two clients and there is roughly the same speed.

Do I miss something in my OPNsense settings?
I would expect speed to be higher with two iperf3 threads.
Or is pf single threaded on OPNsense?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: dave on November 19, 2020, 07:18:18 pm
Nope, the "EU-version" of the Vigor 130 (if there is such a thing) can act as router or modem (bridged mode), but as it is most often used as modem, it comes pre-configured in modem mode. Let the sense do the PPPoE and VLAN is the preferred configuration.

I emailed Draytek:

Quote
Regarding the authentication, the DrayTek UK Vigor 130 was designed to support bridge mode out of the box. You can consider the Vigor 2762 series that can handle PPP authentication.

They didn't outright say no I guess.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: hushcoden on November 20, 2020, 10:21:40 am
I emailed Draytek:

Quote
Regarding the authentication, the DrayTek UK Vigor 130 was designed to support bridge mode out of the box. You can consider the Vigor 2762 series that can handle PPP authentication.

They didn't outright say no I guess.
Have you tried not to use the BT firmware but the alternative one?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: dave on November 21, 2020, 02:14:38 pm
Nope.  Have you?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: FrenchFries on November 21, 2020, 05:49:54 pm
Any idea why speed is higher with a local iperf3 server on a different subnet.
I tested with IPv6 (no NAT) to make sure no NAT was used, the same difference applies:

Client : Linux laptop
Server : bouygues.iperf.fr
Firewall : APU1c OPNsense 20.7 latest.
WAN connected to Gig fiber.
Same results for IPv4 and IPv6, so NAT is not the issue.

Quote
iperf3 -6 -R -p 5206 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5206
Reverse mode, remote host bouygues.iperf.fr is sending
[  5] local 2a01:e0a:2ed:6231:b11b:ac7c:1c41:b3f7 port 53940 connected to 2001:860:deff:1000::2 port 5206
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  54.9 MBytes   461 Mbits/sec                 
[  5]   1.00-2.00   sec  59.2 MBytes   497 Mbits/sec                 
[  5]   2.00-3.00   sec  55.5 MBytes   466 Mbits/sec                 
[  5]   3.00-4.00   sec  53.1 MBytes   446 Mbits/sec                 
[  5]   4.00-5.00   sec  52.6 MBytes   442 Mbits/sec                 
[  5]   5.00-6.00   sec  55.4 MBytes   465 Mbits/sec                 
[  5]   6.00-7.00   sec  53.0 MBytes   445 Mbits/sec                 
[  5]   7.00-8.00   sec  51.9 MBytes   435 Mbits/sec                 
[  5]   8.00-9.00   sec  49.0 MBytes   411 Mbits/sec                 
[  5]   9.00-10.00  sec  58.2 MBytes   488 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   549 MBytes   460 Mbits/sec  40855             sender
[  5]   0.00-10.00  sec   543 MBytes   455 Mbits/sec                  receiver

Client : Linux laptop
Server : Another APU1c running Debian Linux on a separate isolated VLAN (firewall is routing).
Firewall : APU1c OPNsense 20.7 latest routing between VLANs.

Code: [Select]
iperf3 -R -c 10.90.70.250
Connecting to host 10.90.70.250, port 5201
Reverse mode, remote host 10.90.70.250 is sending
[  5] local 10.90.20.1 port 56430 connected to 10.90.70.250 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  87.1 MBytes   731 Mbits/sec                 
[  5]   1.00-2.00   sec  87.9 MBytes   737 Mbits/sec                 
[  5]   2.00-3.00   sec  82.2 MBytes   689 Mbits/sec                 
[  5]   3.00-4.00   sec  83.5 MBytes   701 Mbits/sec                 
[  5]   4.00-5.00   sec  88.2 MBytes   740 Mbits/sec                 
[  5]   5.00-6.00   sec  87.2 MBytes   731 Mbits/sec                 
[  5]   6.00-7.00   sec  87.7 MBytes   736 Mbits/sec                 
[  5]   7.00-8.00   sec  82.8 MBytes   695 Mbits/sec                 
[  5]   8.00-9.00   sec  88.4 MBytes   741 Mbits/sec                 
[  5]   9.00-10.00  sec  90.4 MBytes   758 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   869 MBytes   728 Mbits/sec  577             sender
[  5]   0.00-10.00  sec   865 MBytes   726 Mbits/sec                  receiver

This is not clear to me why there is such a different.
Why is routing between VLANs wit firewall so much faster than routing with IPv6 and gigabyte fiber?
To confirm: my VLANs are not communicating directly on switch.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: FrenchFries on November 21, 2020, 06:14:22 pm
When testing on the same VLAN (so OPNsense does nothing):
Code: [Select]
iperf3 -R -c 10.90.70.250
Connecting to host 10.90.70.250, port 5201
Reverse mode, remote host 10.90.70.250 is sending
[  5] local 10.90.70.110 port 42160 connected to 10.90.70.250 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  93.5 MBytes   784 Mbits/sec                 
[  5]   1.00-2.00   sec  93.6 MBytes   785 Mbits/sec                 
[  5]   2.00-3.00   sec  93.6 MBytes   786 Mbits/sec                 
[  5]   3.00-4.00   sec  94.2 MBytes   790 Mbits/sec                 
[  5]   4.00-5.00   sec  95.8 MBytes   803 Mbits/sec                 
[  5]   5.00-6.00   sec  95.1 MBytes   798 Mbits/sec                 
[  5]   6.00-7.00   sec  95.8 MBytes   803 Mbits/sec                 
[  5]   7.00-8.00   sec  96.1 MBytes   806 Mbits/sec                 
[  5]   8.00-9.00   sec  95.9 MBytes   805 Mbits/sec                 
[  5]   9.00-10.00  sec  96.1 MBytes   806 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   950 MBytes   797 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   950 MBytes   797 Mbits/sec                  receiver

This is close to the speed of inter-VLAN routing with OPNsense.
So OPNsense is very efficient in inter-VLAN routing.

And just to confirm, speed with direct link is close to 1Gb/s:
Code: [Select]
iperf3 -R -p 5206 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5206
Reverse mode, remote host bouygues.iperf.fr is sending
[  5] local 192.168.1.158 port 58658 connected to 89.84.1.222 port 5206
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   111 MBytes   930 Mbits/sec                 
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec                 
[  5]   2.00-3.00   sec   112 MBytes   942 Mbits/sec                 
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec                 
[  5]   4.00-5.00   sec   112 MBytes   942 Mbits/sec                 
[  5]   5.00-6.00   sec   112 MBytes   941 Mbits/sec                 
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec                 
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec                 
[  5]   8.00-9.00   sec   112 MBytes   942 Mbits/sec                 
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   946 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: FrenchFries on November 21, 2020, 06:26:01 pm
OK, I get it. When connecting to the Internet, data is going through WAN on a different network interface. However it is not clear why it is SO much slower than inter-VLAN routing with OPNsense firewall.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: realex on December 15, 2020, 06:21:00 pm
Good evening!

I have an apu2 with 4 NICs and 1000 MBit/s from my ISP (Cable-Modem - bridge mode - opnsense/apu2). I get ~ 650 MBit/s, which gives a gap of ~ 250 MBit/s i pay, but i don't get...

My thoughts to have apu2 + opnsense = 1000 MBit/s:


Would this solve the problem or do i have an error in reasoning?

Maybe someone tested in this direction?



Alex
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: spi39492 on February 07, 2021, 12:04:21 pm
I struggle the same issue. My setup is opnsense 21.1 virtualized as XEN HVM. I'Ve got virtual switches for different vlans. Opnsense just uses plain interfaces. Only vlan is for pppoe Internet uplink.

iperf gives me something like 19 Gbps domU (= virtual machine) to domU in the same vlan. Through opnsense this goes down as to ~700 Mbps. LAN-device on a physical switch to a domU is something around ~300 Mbps. Internet with one single session is somehow limited to around ~250 Mbps.

I checked some of the suggestions from https://bsdrp.net/documentation/technical_docs/performance (https://bsdrp.net/documentation/technical_docs/performance), namely
net.inet6.ip6.redirect = 0
net.inet.ip.redirect was already 0
hw.igb.rx_process_limit = -1
hw.igb.tx_process_limit = -1
kern.random.harvest.mask = 351

It seems to tune the performance a bit - but still need to investigate that further.
Internet goes up to ~400 Mbps.
LAN - domU ~510 Mbps
domU - domU stays around ~700 Mbps.

Found some other BSD relating optimization stuff. need to look into it.
https://calomel.org/network_performance.html (https://calomel.org/network_performance.html)
https://calomel.org/freebsd_network_tuning.html (https://calomel.org/freebsd_network_tuning.html)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: spi39492 on February 07, 2021, 12:32:43 pm
Found some other BSD relating optimization stuff. need to look into it.
https://calomel.org/network_performance.html (https://calomel.org/network_performance.html)
https://calomel.org/freebsd_network_tuning.html (https://calomel.org/freebsd_network_tuning.html)

Playing around with
Code: [Select]
net.inet.ip.ifq.maxlen
kern.ipc.maxsockbuf
net.inet.tcp.recvbuf_inc
net.inet.tcp.recvbuf_max
net.inet.tcp.recvspace
net.inet.tcp.sendbuf_inc
net.inet.tcp.sendbuf_max
net.inet.tcp.sendspace
net.inet.tcp.tso

doesn't seem to change anything. Not sure exactly, if all settings get set correctly, sysctl -a doesn't show them all.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: spi39492 on February 07, 2021, 02:13:28 pm
A quick check with openwrt 19.07 as domU on same server gives LAN - domU 890 Mbps. So almost Gbit wire speed.

With a slight modification - e1000 NICs as virtualized network adapters I get
- LAN - domU wire speed of 940 Mbps
- domU - domU 7700 Mbps (7.7 Gbps)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: thowe on February 07, 2021, 04:11:43 pm
I think you are mixing two things in this thread:

This thread is about the optimization of APU-based hardware devices, which can only do 1GBit/s when specifically optimized on FreeBSD.

The other issue could be performance problems of 21.1 on XEN based virtualization at best. There are already more participants here in the forum with this observation.

I would rather not discuss the XEN issue in this APU thread, as you are more likely to meet users who are also concerned.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on February 07, 2021, 04:19:26 pm
I think you are mixing two things in this thread:

This thread is about the optimization of APU-based hardware devices, which can only do 1GBit/s when specifically optimized on FreeBSD.

The other issue could be performance problems of 21.1 on XEN based virtualization at best. There are already more participants here in the forum with this observation.

I would rather not discuss the XEN issue in this APU thread, as you are more likely to meet users who are also concerned.

Thanks, I was about to ask the same thing (dont mix 2 different things in 1 thread)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: spi39492 on February 08, 2021, 02:54:37 pm
I think you are mixing two things in this thread:

This thread is about the optimization of APU-based hardware devices, which can only do 1GBit/s when specifically optimized on FreeBSD.

The other issue could be performance problems of 21.1 on XEN based virtualization at best. There are already more participants here in the forum with this observation.

I would rather not discuss the XEN issue in this APU thread, as you are more likely to meet users who are also concerned.
Understood that this is specifically on APU-based boards. I observe also performnce issues and couldn't find anything somehow related for Xen. That's why I am interested in your observations - I'd give the performance tuning tips a try.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: thowe on February 08, 2021, 02:58:22 pm
Start with e.g. these (from this thread):

net.inet6.ip6.redirect = 0
net.inet.ip.redirect = 0
hw.igb.rx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)
hw.igb.tx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: spi39492 on February 08, 2021, 04:40:35 pm
Start with e.g. these (from this thread):

net.inet6.ip6.redirect = 0
net.inet.ip.redirect = 0
hw.igb.rx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)
hw.igb.tx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)

Thx - have these. Helped me to increase the speed (as mentioned in one of my posts). But still far away from Gbit.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mater on February 09, 2021, 06:41:06 am
Switch to a Odroid H2+, it achieves Gigabit with no issue.
I gonna sell my APU Board
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pmhausen on February 09, 2021, 07:39:46 am
Switch to a Odroid H2+, it achieves Gigabit with no issue.
Realtek LAN? Sorry, I'll pass. Yes, I see the problem with the apu, but Odroid is not the solution, IMHO.
Protectli looks good ... a bit difficult to find in Europe, though.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on February 09, 2021, 09:38:45 am
Switch to a Odroid H2+, it achieves Gigabit with no issue.
Realtek LAN? Sorry, I'll pass. Yes, I see the problem with the apu, but Odroid is not the solution, IMHO.
Protectli looks good ... a bit difficult to find in Europe, though.

I am no fan of realtek either (that lame company has still bad reputation in 2021 when that really came from the 90s mid 00s), but just because it has a shiny intel logo stick on it, doesnt mean it will have superb performance. Just looking at the fancy i225-V (B1 was broken design and promised B2 is still faulty, now even newer B3 fixed it really) fiasco with 2,5Gbit is literally broken and the workaround was to switch back to 1Gbit max. While the black sheep realtek RTL8125BG just works.---
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 13, 2021, 10:34:05 am
I have been struggling with performance on the APU4. While in initial testing, I was able to get around 700MBit/s with 2 iperf3 streams, with my fully configured firewall rule set (but minimal rules for the actual path I am testing), I am now down to around 250MBit/s and can't get it higher.

Settings from this thread, from https://www.reddit.com/r/homelab/comments/fciqid/hardware_for_my_home_pfsense_router/fjfl8ic/, and from https://teklager.se/en/knowledge-base/opnsense-performance-optimization/ have all been applied, and I am not sure when the performance drop occurred.

What is the best way to debug what's going on here? This is quite frustrating, as I know the hardware to be capable of full GBit/s routing.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: thowe on April 13, 2021, 11:44:01 am
Just to be sure:
- You test with iperf THROUGH the firewall. I.e. iperf is not running on the firewall but on separate hosts "on both sides" of the firewall?
- You have only set pf rules but no other services like IDS, IDP, Sensei etc.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: thowe on April 13, 2021, 11:45:49 am
Something you can check: configure powerd to use "Maximum" instead of "Hiadaptive".

As discussed here: https://forum.opnsense.org/index.php?topic=21194.msg99228#msg99228
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pmhausen on April 13, 2021, 12:24:48 pm
If you use "Maximum", anyway, just disable powerd.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on April 13, 2021, 03:25:35 pm
Are there any penalties from not using powerd (otherwise said to not enable the powerd service)? I assume some hidden bugs may surface, if some system strongly assumes powerd is an always present component. Or at least I would be more careful to say too quickly that powerd is not necessary at all, and doesnt cause any issues.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pmhausen on April 13, 2021, 03:43:20 pm
Code: [Select]
NAME
     powerd – system power control utility

SYNOPSIS
     powerd [-a mode] [-b mode] [-i percent] [-m freq] [-M freq] [-N]
            [-n mode] [-p ival] [-P pidfile] [-r percent] [-s source] [-v]

DESCRIPTION
     The powerd utility monitors the system state and sets various power
     control options accordingly.  It offers power-saving modes that can be
     individually selected for operation on AC power or batteries.
[...]

Powerd's only function in FreeBSD is to set the CPU to power saving modes when idle. There is nothing else that depends on powerd. You do not need to run it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on April 13, 2021, 04:06:49 pm
Somebody complained that their APU2 CPU got stuck at 600Mhz, and had to actually enable the powerd to force CPU clockspeed go over that 600Mhz.
As I said, the behavior of a system may not be easy to predict, even if the theory says something, reality may be something different.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: dave on April 13, 2021, 05:01:08 pm
Are there any penalties from not using powerd (otherwise said to not enable the powerd service)? I assume some hidden bugs may surface, if some system strongly assumes powerd is an always present component. Or at least I would be more careful to say too quickly that powerd is not necessary at all, and doesnt cause any issues.

Disabling PowerD and enable the core performance boost via the bios will lock the cores at 1.4Ghz.
The APU's are only ~10w devices, so you don't need to worry about power savings \ heat.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 15, 2021, 12:32:08 am
Update: I found the culprit for the drop of more than 1/3 in throughput: just enabling IPSec (with configured tunnels up and running) drops locally routed performance from 750-800Mbps to 500Mbps for traffic that doesn't go through the tunnel. This is using IPSec policies and not with a virtual network device.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 15, 2021, 12:38:57 am
And just to confirm: yes, there are two hosts on different sides of the box, one iperf3 server, one client.

coreboot has been updated to the latest available version. PowerD is running and normally set to Hiadaptive as I actually want to save some power for most of the time when there is little traffic. A quick comparison doesn't seem to show a measurable difference between Hiadaptive and Maximum, though performance drops when I disable PowerD altogether (probably confirming the suspicion that the CPU is stuck at 600MHz without it running).
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 15, 2021, 12:50:04 am
Further datapoints: Having flowd_aggregate running (with all local VLAN interfaces monitored) drops around 50Mbps throughput when samplicate is stopped and about 250Mbps when both are running. But this part is - if not good - than at least explainable, as it certainly adds CPU load. The IPSec related throughput drop for streams not hitting IPSec tunnels (which stacks with the netflow drop, i.e. when both are enabled, I only average around 250Mbps total throughput) is what puzzles me.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on April 15, 2021, 03:29:45 pm
check this forum for the "APU2 stuck at 600Mhz" issue:

https://github.com/pcengines/coreboot/issues/457

Regarding the policy based ipsec enablement immediately halves the throughput even if the traffic is bypassing the vpn tunnel, is very concerning. I also have some policy based vpn tunnels, so it may further limit my WAN speed, even if that traffic is not getting routed into the vpn tunnel. Big mess, I have to say, and years can pass by without resolution :(
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pmhausen on April 15, 2021, 04:59:40 pm
Somebody complained that their APU2 CPU got stuck at 600Mhz, and had to actually enable the powerd to force CPU clockspeed go over that 600Mhz.
As I said, the behavior of a system may not be easy to predict, even if the theory says something, reality may be something different.
I was involved in that discussion and IMHO we came to the conclusion that you needed to disable powerd to get beyond 600 MHz. That's precisely why I recommend that.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: thowe on April 15, 2021, 05:42:35 pm
I have opened the ticket with the maintainers of Coreboot for the PC Engines boards. So far, however, there has been nothing further.

I was able to solve the problem by keeping PowerD enabled but setting the mode to maximum. Since then I didn't have the problem anymore.

And if the CPU was limited to 600MHz it cost a lot of performance. In my setup I just got away with it - but had zero headroom. When the CPU is running normally the utilization is rarely more than 50%.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 15, 2021, 05:48:44 pm
Regarding the policy based ipsec enablement immediately halves the throughput even if the traffic is bypassing the vpn tunnel, is very concerning. I also have some policy based vpn tunnels, so it may further limit my WAN speed, even if that traffic is not getting routed into the vpn tunnel. Big mess, I have to say, and years can pass by without resolution :(

Indeed. This happens not only for LAN->WAN traffic, but also for traffic between two different internal (e.g. LAN and DMZ) segments with no NAT involved and only directly connected routes in use. I have not yet tried with VTI instead of policy based IPsec, but this issue may make OpnSense a non-starter for the intended production use at our university institute (that is the reason why I am now spending far too much time putting OpnSense through such tests).
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on April 15, 2021, 08:51:22 pm
Regarding the policy based ipsec enablement immediately halves the throughput even if the traffic is bypassing the vpn tunnel, is very concerning. I also have some policy based vpn tunnels, so it may further limit my WAN speed, even if that traffic is not getting routed into the vpn tunnel. Big mess, I have to say, and years can pass by without resolution :(

Indeed. This happens not only for LAN->WAN traffic, but also for traffic between two different internal (e.g. LAN and DMZ) segments with no NAT involved and only directly connected routes in use. I have not yet tried with VTI instead of policy based IPsec, but this issue may make OpnSense a non-starter for the intended production use at our university institute (that is the reason why I am now spending far too much time putting OpnSense through such tests).

You really want to run a university institute in production with a APU device??  :o
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 15, 2021, 09:32:33 pm
Indeed. This happens not only for LAN->WAN traffic, but also for traffic between two different internal (e.g. LAN and DMZ) segments with no NAT involved and only directly connected routes in use. I have not yet tried with VTI instead of policy based IPsec, but this issue may make OpnSense a non-starter for the intended production use at our university institute (that is the reason why I am now spending far too much time putting OpnSense through such tests).

You really want to run a university institute in production with a APU device??  :o

No, not on an APU - this is my test device to find some of the issues in parallel to a VM installation (which seems to have the same performance issues, actually). We would only put it in production on a faster hardware, but don't expect such bottlenecks to necessarily change. We are aiming for at least 2-3, better 5Gbps throughput between some of the segments, and definitely need IPsec and flow analysis and would like (but don't necessarily require) IDS/IPS on. Given our current experience, I am not sure how likely that is.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pmhausen on April 15, 2021, 10:30:01 pm
Get a Deciso DEC38xx and you will definitely be able to match that requirement.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 16, 2021, 03:57:13 pm
Further tests on an 8-core ProxMox VM server with 4 cores assigned to a OpnSense test instance shows 1.6 Gbps throughput limit with the CPU not fully loaded (only 2 out of 4 cores in the VM being used). Putting traffic flow analysis and Surricata into the mix, I am not sure how a hardware like the one sold by Decisio would reach 5 Gbps with the current OpnSense version. What is the big difference we are missing?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on April 16, 2021, 06:49:51 pm
The best solution is to get a written(!) assurance from Deciso, what traffic their hardware can do. That way you can demand the promised performance for your money, if it turns out thweir hardware underperforms. Otherwise any vendor on the planet can say literally anything they are not shy to say. As you cant depend on generic  marketing PDFs.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 16, 2021, 09:34:14 pm
The concerning bit is the heavy side effect of having IPsec enabled for completely unrelated traffic. It points to a general performance bottleneck in the kernel.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: rmayr on April 18, 2021, 08:38:24 pm
Now with more detailed measurements: https://www.mayrhofer.eu.org/post/firewall-throughput-opnsense-openwrt/