PC Engines APU2 1Gbit traffic not achievable

Started by Ricardo, July 27, 2018, 12:24:54 PM

Previous topic - Next topic
Dear Opnsense team,

I am facing significant performance issue using Opnsense 18.1.x

Hardware: PC Engines APU2C4, 3x i210AT NIC / AMD GX-412TC CPU / 4 GB DRAM

Issue:
this HW cannot handle 1 Gigabit wire-speed via single-flow network traffic if using Opnsense. Maximum I could get is approx. 450 Mbit (WAN --> LAN direction). There are no custom firewall rules / IDS / IPS / etc. apart from the factory default state after a clean install (I used the serial-installer of 18.1.6rev2, then upgraded all up to 18.1.13 if that counts).

However:
the exact same HW can easily do 850-900+ Mbit/sec single-flow traffic if using a Linux firewall distrib (I used the latest IPFire 2.19 - Core Update 120) and observing much less load during this traffic compared with the load observed in opnsense.

Iperf3 single-flow performance was used to measure throughput, using IP protocol, and NAT. No IMIX stress-test before you ask, on contrary, the biggest possible MTU (1500) and MSS size (1460) was set .

My real concern is about the performance drop, if I enable PPPoE (my ISP connects through PPPoE): as google revealed many "single-thread pppoe speed penalty" topics, that is what started my whole descend into this topic. But as I have bad routing performance using a very ideal setup of purely IP type, I expect PPPoE to be much worse (by definition, it can just be worse after all).

Checking on Freebsd net-mail list about possible solutions/workarounds quickly revealed, that Opnsense is not running Freebsd, but a fork of it (HardenedBSD). So Freebsd support for opnsense is practically non-existent. Or at least everybody keeps pointing fingers to the other kind of situation. I saw several times in this forum, that you refer to Freebsd forums if someone hit a bug that is not considered as Opnsense bug, but rather bug of the underlying OS. Reading that the relationship between the Freebsd team and Hardenesbsd team is far from friendly, I wonder what kind of help one can expect if issue with the OS is found?

The thread started here:
Bug 203856 - [igb] PPPoE RX traffic is limitied to one queue -->
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856

then continued here:
https://lists.freebsd.org/pipermail/freebsd-net/2018-July/051197.html

And that is the point, where I am stuck.

In short:
- I tried all the valid settings / tuning seen here:
https://bsdrp.net/documentation/technical_docs/performance#nic_drivers_tuning --> specifics for APU2+igb
- tried "net.isr.maxthreads" and "net.isr.numthreads" greater than 1 and switch net.isr.dispatch to "deferred" --> no measurable improvement in performance, but the load nearly doubled on the APU2

I have collected various performance data during traffic, if that helps to troubleshoot where the bottleneck is in this opnsense system.

-------------------------------------------------------------------------------------------------------------------------------

Opnsense 18.1.13
OS: FreeBSD 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11  116e406d37f(stable/18.1)  amd64

kldstat:
Id Refs Address            Size     Name
1   91 0xffffffff80200000 213bb20  kernel
2    1 0xffffffff8233d000 6e18     if_gre.ko
3    1 0xffffffff82344000 7570     if_tap.ko
4    3 0xffffffff8234c000 54e78    pf.ko
5    1 0xffffffff823a1000 e480     carp.ko
6    1 0xffffffff823b0000 e3e0     if_bridge.ko
7    2 0xffffffff823bf000 6fd0     bridgestp.ko
8    1 0xffffffff823c6000 126a8    if_lagg.ko
9    1 0xffffffff823d9000 1610     ng_UI.ko
10   31 0xffffffff823db000 173e0    netgraph.ko
11    1 0xffffffff823f3000 3620     ng_async.ko
12    1 0xffffffff823f7000 4fb8     ng_bpf.ko
13    1 0xffffffff823fc000 4e98     ng_bridge.ko
14    1 0xffffffff82401000 31e0     ng_cisco.ko
15    1 0xffffffff82405000 f20      ng_echo.ko
16    1 0xffffffff82406000 38b8     ng_eiface.ko
17    1 0xffffffff8240a000 4870     ng_ether.ko
18    1 0xffffffff8240f000 1db0     ng_frame_relay.ko
19    1 0xffffffff82411000 17e8     ng_hole.ko
20    1 0xffffffff82413000 4250     ng_iface.ko
21    1 0xffffffff82418000 6250     ng_ksocket.ko
22    1 0xffffffff8241f000 7d88     ng_l2tp.ko
23    1 0xffffffff82427000 3fe0     ng_lmi.ko
24    1 0xffffffff8242b000 65c8     ng_mppc.ko
25    2 0xffffffff82432000 b48      rc4.ko
26    1 0xffffffff82433000 2ad8     ng_one2many.ko
27    1 0xffffffff82436000 a3e0     ng_ppp.ko
28    1 0xffffffff82441000 8f08     ng_pppoe.ko
29    1 0xffffffff8244a000 5f68     ng_pptpgre.ko
30    1 0xffffffff82450000 2570     ng_rfc1490.ko
31    1 0xffffffff82453000 6288     ng_socket.ko
32    1 0xffffffff8245a000 21a0     ng_tee.ko
33    1 0xffffffff8245d000 2ec0     ng_tty.ko
34    1 0xffffffff82460000 45b8     ng_vjc.ko
35    1 0xffffffff82465000 2f20     ng_vlan.ko
36    1 0xffffffff82468000 31f0     if_enc.ko
37    1 0xffffffff8246c000 28b0     pflog.ko
38    1 0xffffffff8246f000 d578     pfsync.ko
39    1 0xffffffff8247d000 3370     ng_car.ko
40    1 0xffffffff82481000 36a8     ng_deflate.ko
41    1 0xffffffff82485000 4ef8     ng_pipe.ko
42    1 0xffffffff8248a000 3658     ng_pred1.ko
43    1 0xffffffff8248e000 2058     ng_tcpmss.ko
44    1 0xffffffff82621000 7130     aesni.ko
45    1 0xffffffff82629000 1055     amdtemp.ko


The 2 PC I use to generate traffic are 2x Win7 boxes,
PC-A connects directly to igb0 (WAN endpoint), IP addr. 192.168.1.2
PC-B connects directly to igb1 (LAN endpoint), IP addr. 10.0.0.100

I run:

(on the PC-A) iperf3 -s
(on the PC-B) iperf3 -c 192.168.1.2 -t 300  -P 1 -R (-R to simulate traffic direction FROM Wan TO Lan, after PC-B made initial connection TO PC-A)
---------------------------------------------------------------------------------------------------------------------------------------------

loader.conf:

##############################################################
# This file was auto-generated using the rc.loader facility. #
# In order to deploy a custom change to this installation,   #
# please use /boot/loader.conf.local as it is not rewritten. #
##############################################################

loader_brand="opnsense"
loader_logo="hourglass"
loader_menu_title=""

autoboot_delay="3"
hw.usb.no_pf="1"
# see https://forum.opnsense.org/index.php?topic=6366.0
hint.ahci.0.msi="0"
hint.ahci.1.msi="0"

# Vital modules that are not in FreeBSD's GENERIC
# configuration will be loaded on boot, which makes
# races with individual module's settings impossible.
carp_load="YES"
if_bridge_load="YES"
if_enc_load="YES"
if_gif_load="YES"
if_gre_load="YES"
if_lagg_load="YES"
if_tap_load="YES"
if_tun_load="YES"
if_vlan_load="YES"
pf_load="YES"
pflog_load="YES"
pfsync_load="YES"

# The netgraph(4) framework is loaded here
# for backwards compat for when the kernel
# had these compiled in, not as modules. This
# list needs further pruning and probing.
netgraph_load="YES"
ng_UI_load="YES"
ng_async_load="YES"
ng_bpf_load="YES"
ng_bridge_load="YES"
ng_car_load="YES"
ng_cisco_load="YES"
ng_deflate_load="YES"
ng_echo_load="YES"
ng_eiface_load="YES"
ng_ether_load="YES"
ng_frame_relay_load="YES"
ng_hole_load="YES"
ng_iface_load="YES"
ng_ksocket_load="YES"
ng_l2tp_load="YES"
ng_lmi_load="YES"
ng_mppc_load="YES"
ng_one2many_load="YES"
ng_pipe_load="YES"
ng_ppp_load="YES"
ng_pppoe_load="YES"
ng_pptpgre_load="YES"
ng_pred1_load="YES"
ng_rfc1490_load="YES"
ng_socket_load="YES"
ng_tcpmss_load="YES"
ng_tee_load="YES"
ng_tty_load="YES"
ng_vjc_load="YES"
ng_vlan_load="YES"

# dynamically generated tunables settings follow
net.enc.in.ipsec_bpf_mask="2"
net.enc.in.ipsec_filter_mask="2"
net.enc.out.ipsec_bpf_mask="1"
net.enc.out.ipsec_filter_mask="1"
debug.pfftpproxy="0"
vfs.read_max="32"
net.inet.ip.portrange.first="1024"
net.inet.tcp.blackhole="2"
net.inet.udp.blackhole="1"
net.inet.ip.random_id="1"
net.inet.ip.sourceroute="0"
net.inet.ip.accept_sourceroute="0"
net.inet.icmp.drop_redirect="0"
net.inet.icmp.log_redirect="0"
net.inet.tcp.drop_synfin="1"
net.inet.ip.redirect="1"
net.inet6.ip6.redirect="1"
net.inet6.ip6.use_tempaddr="0"
net.inet6.ip6.prefer_tempaddr="0"
net.inet.tcp.syncookies="1"
net.inet.tcp.recvspace="65228"
net.inet.tcp.sendspace="65228"
net.inet.tcp.delayed_ack="0"
net.inet.udp.maxdgram="57344"
net.link.bridge.pfil_onlyip="0"
net.link.bridge.pfil_local_phys="0"
net.link.bridge.pfil_member="1"
net.link.bridge.pfil_bridge="0"
net.link.tap.user_open="1"
kern.randompid="347"
net.inet.ip.intr_queue_maxlen="1000"
hw.syscons.kbd_reboot="0"
net.inet.tcp.log_debug="0"
net.inet.icmp.icmplim="0"
net.inet.tcp.tso="1"
net.inet.udp.checksum="1"
kern.ipc.maxsockbuf="4262144"
vm.pmap.pti="1"
hw.ibrs_disable="0"

# dynamically generated console settings follow
comconsole_speed="115200"
#boot_multicons
boot_serial="YES"
#kern.vty
console="comconsole"

---------------------------------------------
loader.conf.local

# I have commented everything out, (did reboot to apply) to start performance tuning from scratch

#kern.random.harvest.mask=351
#hw.igb.rx_process_limit=-1
#net.link.ifqmaxlen=2048
#net.isr.numthreads=4
#net.isr.maxthreads=4
#net.isr.dispatch=deferred
#net.isr.bindthreads=1
------------------------------------------------

sysctl.conf is practically empty

------------------------------------------------

ifconfig:

Note: igb0 is "WAN", igb1 is "LAN"
Note2: no PPPoE so far!

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5c
        hwaddr 00:0d:b9:4b:0b:5c
        inet6 fe80::20d:b9ff:fe4b:b5c%igb0 prefixlen 64 scopeid 0x1
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5d
        hwaddr 00:0d:b9:4b:0b:5d
        inet6 fe80::20d:b9ff:fe4b:b5d%igb1 prefixlen 64 scopeid 0x2
        inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5e
        hwaddr 00:0d:b9:4b:0b:5e
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
enc0: flags=0<> metric 0 mtu 1536
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: enc
pflog0: flags=100<PROMISC> metric 0 mtu 33160
        groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
        groups: pfsync
        syncpeer: 0.0.0.0 maxupd: 128 defer: off

--------------------------------------------------------------

top -SHPI

last pid: 90572;  load averages:  2.13,  1.48,  1.01    up 0+15:54:28  08:58:36
136 processes: 8 running, 99 sleeping, 29 waiting
CPU 0:  0.0% user,  0.0% nice, 99.1% system,  0.0% interrupt,  0.9% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system, 67.1% interrupt, 32.9% idle
CPU 2:  0.3% user,  0.0% nice,  0.8% system,  0.2% interrupt, 98.7% idle
CPU 3:  0.2% user,  0.0% nice,  1.9% system,  6.8% interrupt, 91.2% idle
Mem: 36M Active, 179M Inact, 610M Wired, 387M Buf, 3102M Free
Swap:

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    0 root       -92    -     0K   448K CPU0    0   1:32  99.37% kernel{igb0 qu
   11 root       155 ki31     0K    64K CPU2    2 904:01  98.85% idle{idle: cpu
   11 root       155 ki31     0K    64K RUN     3 909:09  93.95% idle{idle: cpu
   12 root       -92    -     0K   496K CPU1    1   1:54  50.64% intr{irq262: i
   11 root       155 ki31     0K    64K CPU1    1 906:22  39.25% idle{idle: cpu
   12 root       -92    -     0K   496K WAIT    1   0:26  10.09% intr{irq257: i
   12 root       -92    -     0K   496K WAIT    3   0:03   3.19% intr{irq264: i
   17 root       -16    -     0K    16K -       3   0:08   1.12% rand_harvestq
39298 unbound     20    0 72916K 31596K kqread  3   0:01   1.09% unbound{unboun
   12 root       -92    -     0K   496K WAIT    3   0:02   0.61% intr{irq259: i
   11 root       155 ki31     0K    64K RUN     0 912:29   0.52% idle{idle: cpu
   12 root       -72    -     0K   496K WAIT    2   0:02   0.35% intr{swi1: pfs
    0 root       -92    -     0K   448K -       2   0:00   0.24% kernel{igb1 qu
   12 root       -76    -     0K   496K WAIT    3   0:03   0.15% intr{swi0: uar




-----------------------------

systat -vm 3


    1 users    Load  2.58  1.69  1.11                  Jul 27 08:59
   Mem usage:  21%Phy  1%Kmem
Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  129892   36820 12632092    39224 3175880  count
All  133660   40504 13715660    67628          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt 35953 total
             32       52k    2 5198  32k  926             cow       4 uart0 4
                                                          zfod      1 ehci0 18
25.9%Sys  18.7%Intr  2.1%User  0.0%Nice 53.3%Idle         ozfod       ahci0 19
|    |    |    |    |    |    |    |    |    |           %ozfod  1123 cpu0:timer
=============+++++++++>                                   daefr  1126 cpu1:timer
                                        29 dtbuf          prcfr  1127 cpu3:timer
Namei     Name-cache   Dir-cache    145989 desvn          totfr    84 cpu2:timer
   Calls    hits   %    hits   %     36007 numvn          react     1 igb0:que 0
      19      19 100                 14872 frevn          pdwak 13759 igb0:que 1
                                                       15 pdpgs     1 igb0:que 2
Disks  ada0 pass0                                         intrn     3 igb0:que 3
KB/t   0.00  0.00                                  624712 wire        igb0:link
tps       0     0                                   36984 act       1 igb1:que 0
MB/s   0.00  0.00                                  183780 inact 13514 igb1:que 1
%busy     0     0                                         laund     3 igb1:que 2
                                                  3175880 free   5206 igb1:que 3



-----------------------------

systat -ifstat 3


                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   ||||||||||||||

      Interface           Traffic               Peak                Total
            lo0  in      0.089 KB/s          0.982 KB/s            3.729 MB
                 out     0.089 KB/s          0.982 KB/s            3.729 MB

           igb1  in      1.184 MB/s          1.194 MB/s          603.486 MB
                 out    56.019 MB/s         56.498 MB/s           27.880 GB

           igb0  in     55.994 MB/s         56.525 MB/s           27.880 GB
                 out     1.183 MB/s          1.194 MB/s          603.794 MB



--------------------------------------------

vmstat -i 5

irq4: uart0                           60         12
irq18: ehci0                           4          1
irq19: ahci0                           0          0
cpu0:timer                          4949        989
cpu1:timer                          5623       1124
cpu3:timer                          5623       1124
cpu2:timer                          3845        769
irq256: igb0:que 0                     5          1
irq257: igb0:que 1                 70255      14045
irq258: igb0:que 2                     8          2
irq259: igb0:que 3                    19          4
irq260: igb0:link                      0          0
irq261: igb1:que 0                    10          2
irq262: igb1:que 1                 68832      13761
irq263: igb1:que 2                     5          1
irq264: igb1:que 3                 25967       5191
irq265: igb1:link                      0          0
Total                             185205      37026

---------------------------------------------------------------------------------------

Thanks for your help in advance

Regards,
Richard

Just to further add to the topic:

It seems when 2 parallel iperf streams are running, sometimes I get great results (approx. 800-850 Mbit/sec), and sometimes less than ok results (various values can happen like 300 or 500 or 600 Mbit/sec).

From "top -CHIPS" its clearly visible, that in bad scenarios, only 1 core is getting 100% utilized (100% interrupt), while the other 3 cores are idle at 99%.  During middle-performance case, 2 cores are 100% interrupt, 2 cores are 100% idle. If the best-case scenario happens (800-850 Mbit), 3 cores are nearly all 100% while 1 core is idle 100%. So there must be something happening in the proper load balancing of NIC queues--> CPU cores.

Just by simply re-run the same iperf commandline, I get all these various test results. The values are quite solid during the same session run, but after iperf session completion, re-running the exact same command between the exact same 2 endpoint PC, I get such larg variations. Interrupt-load in top clearly confirms this.

Can anyone reproduce the same test cases, or confirm similar results?
Of course that still does not help the weak single-core performance.

August 02, 2018, 12:06:23 AM #2 Last Edit: August 02, 2018, 12:11:24 AM by Rainmaker
I used to have an APU2C4, and realised from looking around the web that others had the same problem. For example, see this article here. They too seem to blame single-core routing but you have found that at times the cores are more evenly used. I have read that later versions of FreeBSD got better at SMP/multi-core routing but apparently not all the way there yet? Perhaps using several iperf3 sessions you are tying one session to a core, and thus getting better (parallel) throughput that way?

Edit: You may also wish to try these settings/tweaks. I didn't see them before I sold my APU2 and got a G4560 based box instead, but they could help. Report back your findings please.

I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.

Quote from: KantFreeze on August 02, 2018, 05:32:05 AM
I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.

What's your hardware? The APU2 is a particular case, as it has a low single core speed (1GHz) and is an embedded low power SoC. For normal x86 hardware you'll be fine - I run 380Mbps down on a small form factor Pentium G4560 and it doesn't break a sweat. Gigabit is fine too.

I don't think it's practical to compare Linux and FreeBSD throughput and expect them to match. The latter will be lower.


Cheers,
Franco

My hardware is an APU2C4 :).

As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.

Quote from: KantFreeze on August 02, 2018, 04:11:00 PM
My hardware is an APU2C4 :).

As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.

Yes of course, but think of it another way. The APU2 is 'only' 1GHz per core. If OPNsense is only using a single core for routing, you've got 1GHz processing power to try to max your connection. Linux on the other hand is multi-core aware. So now you're using 4x 1GHz for routing your connection. No wonder the throughput is higher. Actually, as I said earlier FreeBSD is now getting much better with spreading load across cores, though it doesn't apply for every part of the 'networking' process. FreeBSD has probably the best networking stack in the world, or certainly one of them. It can route 10Gbps, 40Gbps, even 100Gbps on suitable hardware. Unfortunately, the APU2 isn't the most suitable hardware (for high throughput on *BSD).

If you need >500Mbps stick to Linux and you won't have an issue. If you want <500Mbps then *sense will be fine on your APU.

Rainmaker,

I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.

But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.

Quote from: KantFreeze on August 02, 2018, 04:27:59 PM
Rainmaker,

I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.

But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.

Ah, you (respectfully) are a lot more knowledgeable than I catered for in my response. Apologies, it's difficult to pitch your responses on the Internet; especially when you and the other people don't know each other yet (as I'm sure you know).

Yes, FreeBSD's pf is indeed much more SMP capable. Last week I took both OpenBSD and FreeBSD installs and 'made' routers out of them, before comparing them side-by-side. Even on an 8700k at 5GHz per core OpenBSD was less performant than FreeBSD. However there are many other factors, as we both touched upon in previous posts.

NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:

/boot/loader.conf.local
 
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1


Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.

Quote from: Rainmaker on August 02, 2018, 04:39:44 PM

NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:

/boot/loader.conf.local
 
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1


Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.

Hi rainmaker,

The license ack has nothing to do with igb driver (imho). This ist related to Intel PRO/Wireless adapters.
(https://www.freebsd.org/cgi/man.cgi?iwi)

regards pylox

Hi pylox,

Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.

My apologies for repeating it here, and thanks for the lesson.

Quote from: Rainmaker on August 03, 2018, 03:11:59 PM
Hi pylox,

Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.

My apologies for repeating it here, and thanks for the lesson.

Hi rainmaker,

no problem... Some time before i did the same entry in loader.conf.local... ;D After some research i realize it's bullshit...
I think the OP's problem is something special PPPoE related topic. Normally there should no problems with performance on APU2.

regards pylox

August 06, 2018, 02:18:30 PM #13 Last Edit: August 06, 2018, 03:17:14 PM by ricsip
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.

All I could see, that performance of single-flow iperf is constantly maxing at around 450 Mbit/sec (the direction is FROM wan TO lan). FROM lan TO wan seems slightly higher, about approx. 600-650ish Mbit/sec.

Multi-flow iperf: now here comes interesting things. The result varies from run to run, e.g. I run a 2-flow iperf session, that takes 60 seconds. It finishes, I immediately re-start with the same command, and I get a totally different result. Then after 60 seconds, repeat, yet another completely different result, in terms of throughput.

With 2-flow iperf, sometimes I can reach 850-900 Mbit, other times only as low as 250 Mbit. Yep, quite gigantic difference, even though all relevant test parameters unchanged.

When I get 850-900 Mbit throughput, the 2 flows are evenly distributed (450Mbit+450Mbit flow = Total 900 Mbit), and CPU interrupt usage is around 270%-280% (explanation: total CPU processing power is 400% = 100% per CPU core times 4 cores).

When I get 600 Mbit, usually I see 1 flow with 580Mbit, and another flow with 1-2 or max 10Mbit. Interrupt is approx. 170-180%. When I get 200-300Mbit, I get sometimes 2x 150 Mbit, other times 1x190+1x2-3Mbit flows, and only 100% interrupt usage (on 1 single core). And these vary from run to run.

Quote from: ricsip on August 06, 2018, 02:18:30 PM
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
......

Hi ricsip,

this ist very hard to find. Unfornatunatly i did not have a test setup with a APU2 (and not much time).
But you can try different things:

1. Change this tunables and measure...
vm.pmap.pti="0"  #(disable meltdown patch - this is an AMD processor)
hw.ibrs_disable="1" #(disable spectre patch temporarily)

2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0  #(x = number of interface)

3. Change the network interface interrupt rate and measure
hw.igb.max_interrupt_rate="16000" #(start with 16000, can increased up to 64000)

4. Disable Energy Efficiency for each interface an measure
dev.igb.<x>.eee_disabled="1" #(x = number of interface)

Should be enough for the first time...;-)

regards pylox