OPNsense Forum

English Forums => Hardware and Performance => Topic started by: Ricardo on July 27, 2018, 12:24:54 pm

Title: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on July 27, 2018, 12:24:54 pm
Dear Opnsense team,

I am facing significant performance issue using Opnsense 18.1.x

Hardware: PC Engines APU2C4, 3x i210AT NIC / AMD GX-412TC CPU / 4 GB DRAM

Issue:
this HW cannot handle 1 Gigabit wire-speed via single-flow network traffic if using Opnsense. Maximum I could get is approx. 450 Mbit (WAN --> LAN direction). There are no custom firewall rules / IDS / IPS / etc. apart from the factory default state after a clean install (I used the serial-installer of 18.1.6rev2, then upgraded all up to 18.1.13 if that counts).

However:
the exact same HW can easily do 850-900+ Mbit/sec single-flow traffic if using a Linux firewall distrib (I used the latest IPFire 2.19 - Core Update 120) and observing much less load during this traffic compared with the load observed in opnsense.

Iperf3 single-flow performance was used to measure throughput, using IP protocol, and NAT. No IMIX stress-test before you ask, on contrary, the biggest possible MTU (1500) and MSS size (1460) was set .

My real concern is about the performance drop, if I enable PPPoE (my ISP connects through PPPoE): as google revealed many "single-thread pppoe speed penalty" topics, that is what started my whole descend into this topic. But as I have bad routing performance using a very ideal setup of purely IP type, I expect PPPoE to be much worse (by definition, it can just be worse after all).

Checking on Freebsd net-mail list about possible solutions/workarounds quickly revealed, that Opnsense is not running Freebsd, but a fork of it (HardenedBSD). So Freebsd support for opnsense is practically non-existent. Or at least everybody keeps pointing fingers to the other kind of situation. I saw several times in this forum, that you refer to Freebsd forums if someone hit a bug that is not considered as Opnsense bug, but rather bug of the underlying OS. Reading that the relationship between the Freebsd team and Hardenesbsd team is far from friendly, I wonder what kind of help one can expect if issue with the OS is found?

The thread started here:
Bug 203856 - [igb] PPPoE RX traffic is limitied to one queue -->
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856
 
then continued here:
https://lists.freebsd.org/pipermail/freebsd-net/2018-July/051197.html

And that is the point, where I am stuck.

In short:
- I tried all the valid settings / tuning seen here:
https://bsdrp.net/documentation/technical_docs/performance#nic_drivers_tuning --> specifics for APU2+igb
- tried "net.isr.maxthreads" and "net.isr.numthreads" greater than 1 and switch net.isr.dispatch to "deferred" --> no measurable improvement in performance, but the load nearly doubled on the APU2

I have collected various performance data during traffic, if that helps to troubleshoot where the bottleneck is in this opnsense system.

-------------------------------------------------------------------------------------------------------------------------------

Opnsense 18.1.13
OS: FreeBSD 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11  116e406d37f(stable/18.1)  amd64

kldstat:
Id Refs Address            Size     Name
 1   91 0xffffffff80200000 213bb20  kernel
 2    1 0xffffffff8233d000 6e18     if_gre.ko
 3    1 0xffffffff82344000 7570     if_tap.ko
 4    3 0xffffffff8234c000 54e78    pf.ko
 5    1 0xffffffff823a1000 e480     carp.ko
 6    1 0xffffffff823b0000 e3e0     if_bridge.ko
 7    2 0xffffffff823bf000 6fd0     bridgestp.ko
 8    1 0xffffffff823c6000 126a8    if_lagg.ko
 9    1 0xffffffff823d9000 1610     ng_UI.ko
10   31 0xffffffff823db000 173e0    netgraph.ko
11    1 0xffffffff823f3000 3620     ng_async.ko
12    1 0xffffffff823f7000 4fb8     ng_bpf.ko
13    1 0xffffffff823fc000 4e98     ng_bridge.ko
14    1 0xffffffff82401000 31e0     ng_cisco.ko
15    1 0xffffffff82405000 f20      ng_echo.ko
16    1 0xffffffff82406000 38b8     ng_eiface.ko
17    1 0xffffffff8240a000 4870     ng_ether.ko
18    1 0xffffffff8240f000 1db0     ng_frame_relay.ko
19    1 0xffffffff82411000 17e8     ng_hole.ko
20    1 0xffffffff82413000 4250     ng_iface.ko
21    1 0xffffffff82418000 6250     ng_ksocket.ko
22    1 0xffffffff8241f000 7d88     ng_l2tp.ko
23    1 0xffffffff82427000 3fe0     ng_lmi.ko
24    1 0xffffffff8242b000 65c8     ng_mppc.ko
25    2 0xffffffff82432000 b48      rc4.ko
26    1 0xffffffff82433000 2ad8     ng_one2many.ko
27    1 0xffffffff82436000 a3e0     ng_ppp.ko
28    1 0xffffffff82441000 8f08     ng_pppoe.ko
29    1 0xffffffff8244a000 5f68     ng_pptpgre.ko
30    1 0xffffffff82450000 2570     ng_rfc1490.ko
31    1 0xffffffff82453000 6288     ng_socket.ko
32    1 0xffffffff8245a000 21a0     ng_tee.ko
33    1 0xffffffff8245d000 2ec0     ng_tty.ko
34    1 0xffffffff82460000 45b8     ng_vjc.ko
35    1 0xffffffff82465000 2f20     ng_vlan.ko
36    1 0xffffffff82468000 31f0     if_enc.ko
37    1 0xffffffff8246c000 28b0     pflog.ko
38    1 0xffffffff8246f000 d578     pfsync.ko
39    1 0xffffffff8247d000 3370     ng_car.ko
40    1 0xffffffff82481000 36a8     ng_deflate.ko
41    1 0xffffffff82485000 4ef8     ng_pipe.ko
42    1 0xffffffff8248a000 3658     ng_pred1.ko
43    1 0xffffffff8248e000 2058     ng_tcpmss.ko
44    1 0xffffffff82621000 7130     aesni.ko
45    1 0xffffffff82629000 1055     amdtemp.ko


The 2 PC I use to generate traffic are 2x Win7 boxes,
PC-A connects directly to igb0 (WAN endpoint), IP addr. 192.168.1.2
PC-B connects directly to igb1 (LAN endpoint), IP addr. 10.0.0.100

I run:

(on the PC-A) iperf3 -s
(on the PC-B) iperf3 -c 192.168.1.2 -t 300  -P 1 -R (-R to simulate traffic direction FROM Wan TO Lan, after PC-B made initial connection TO PC-A)
---------------------------------------------------------------------------------------------------------------------------------------------

loader.conf:

##############################################################
# This file was auto-generated using the rc.loader facility. #
# In order to deploy a custom change to this installation,   #
# please use /boot/loader.conf.local as it is not rewritten. #
##############################################################

loader_brand="opnsense"
loader_logo="hourglass"
loader_menu_title=""

autoboot_delay="3"
hw.usb.no_pf="1"
# see https://forum.opnsense.org/index.php?topic=6366.0
hint.ahci.0.msi="0"
hint.ahci.1.msi="0"

# Vital modules that are not in FreeBSD's GENERIC
# configuration will be loaded on boot, which makes
# races with individual module's settings impossible.
carp_load="YES"
if_bridge_load="YES"
if_enc_load="YES"
if_gif_load="YES"
if_gre_load="YES"
if_lagg_load="YES"
if_tap_load="YES"
if_tun_load="YES"
if_vlan_load="YES"
pf_load="YES"
pflog_load="YES"
pfsync_load="YES"

# The netgraph(4) framework is loaded here
# for backwards compat for when the kernel
# had these compiled in, not as modules. This
# list needs further pruning and probing.
netgraph_load="YES"
ng_UI_load="YES"
ng_async_load="YES"
ng_bpf_load="YES"
ng_bridge_load="YES"
ng_car_load="YES"
ng_cisco_load="YES"
ng_deflate_load="YES"
ng_echo_load="YES"
ng_eiface_load="YES"
ng_ether_load="YES"
ng_frame_relay_load="YES"
ng_hole_load="YES"
ng_iface_load="YES"
ng_ksocket_load="YES"
ng_l2tp_load="YES"
ng_lmi_load="YES"
ng_mppc_load="YES"
ng_one2many_load="YES"
ng_pipe_load="YES"
ng_ppp_load="YES"
ng_pppoe_load="YES"
ng_pptpgre_load="YES"
ng_pred1_load="YES"
ng_rfc1490_load="YES"
ng_socket_load="YES"
ng_tcpmss_load="YES"
ng_tee_load="YES"
ng_tty_load="YES"
ng_vjc_load="YES"
ng_vlan_load="YES"

# dynamically generated tunables settings follow
net.enc.in.ipsec_bpf_mask="2"
net.enc.in.ipsec_filter_mask="2"
net.enc.out.ipsec_bpf_mask="1"
net.enc.out.ipsec_filter_mask="1"
debug.pfftpproxy="0"
vfs.read_max="32"
net.inet.ip.portrange.first="1024"
net.inet.tcp.blackhole="2"
net.inet.udp.blackhole="1"
net.inet.ip.random_id="1"
net.inet.ip.sourceroute="0"
net.inet.ip.accept_sourceroute="0"
net.inet.icmp.drop_redirect="0"
net.inet.icmp.log_redirect="0"
net.inet.tcp.drop_synfin="1"
net.inet.ip.redirect="1"
net.inet6.ip6.redirect="1"
net.inet6.ip6.use_tempaddr="0"
net.inet6.ip6.prefer_tempaddr="0"
net.inet.tcp.syncookies="1"
net.inet.tcp.recvspace="65228"
net.inet.tcp.sendspace="65228"
net.inet.tcp.delayed_ack="0"
net.inet.udp.maxdgram="57344"
net.link.bridge.pfil_onlyip="0"
net.link.bridge.pfil_local_phys="0"
net.link.bridge.pfil_member="1"
net.link.bridge.pfil_bridge="0"
net.link.tap.user_open="1"
kern.randompid="347"
net.inet.ip.intr_queue_maxlen="1000"
hw.syscons.kbd_reboot="0"
net.inet.tcp.log_debug="0"
net.inet.icmp.icmplim="0"
net.inet.tcp.tso="1"
net.inet.udp.checksum="1"
kern.ipc.maxsockbuf="4262144"
vm.pmap.pti="1"
hw.ibrs_disable="0"

# dynamically generated console settings follow
comconsole_speed="115200"
#boot_multicons
boot_serial="YES"
#kern.vty
console="comconsole"

---------------------------------------------
loader.conf.local

# I have commented everything out, (did reboot to apply) to start performance tuning from scratch

#kern.random.harvest.mask=351
#hw.igb.rx_process_limit=-1
#net.link.ifqmaxlen=2048
#net.isr.numthreads=4
#net.isr.maxthreads=4
#net.isr.dispatch=deferred
#net.isr.bindthreads=1
------------------------------------------------

sysctl.conf is practically empty

------------------------------------------------

ifconfig:

Note: igb0 is "WAN", igb1 is "LAN"
Note2: no PPPoE so far!

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5c
        hwaddr 00:0d:b9:4b:0b:5c
        inet6 fe80::20d:b9ff:fe4b:b5c%igb0 prefixlen 64 scopeid 0x1
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5d
        hwaddr 00:0d:b9:4b:0b:5d
        inet6 fe80::20d:b9ff:fe4b:b5d%igb1 prefixlen 64 scopeid 0x2
        inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:0d:b9:4b:0b:5e
        hwaddr 00:0d:b9:4b:0b:5e
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
enc0: flags=0<> metric 0 mtu 1536
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: enc
pflog0: flags=100<PROMISC> metric 0 mtu 33160
        groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
        groups: pfsync
        syncpeer: 0.0.0.0 maxupd: 128 defer: off

--------------------------------------------------------------

top -SHPI

last pid: 90572;  load averages:  2.13,  1.48,  1.01    up 0+15:54:28  08:58:36
136 processes: 8 running, 99 sleeping, 29 waiting
CPU 0:  0.0% user,  0.0% nice, 99.1% system,  0.0% interrupt,  0.9% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system, 67.1% interrupt, 32.9% idle
CPU 2:  0.3% user,  0.0% nice,  0.8% system,  0.2% interrupt, 98.7% idle
CPU 3:  0.2% user,  0.0% nice,  1.9% system,  6.8% interrupt, 91.2% idle
Mem: 36M Active, 179M Inact, 610M Wired, 387M Buf, 3102M Free
Swap:

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    0 root       -92    -     0K   448K CPU0    0   1:32  99.37% kernel{igb0 qu
   11 root       155 ki31     0K    64K CPU2    2 904:01  98.85% idle{idle: cpu
   11 root       155 ki31     0K    64K RUN     3 909:09  93.95% idle{idle: cpu
   12 root       -92    -     0K   496K CPU1    1   1:54  50.64% intr{irq262: i
   11 root       155 ki31     0K    64K CPU1    1 906:22  39.25% idle{idle: cpu
   12 root       -92    -     0K   496K WAIT    1   0:26  10.09% intr{irq257: i
   12 root       -92    -     0K   496K WAIT    3   0:03   3.19% intr{irq264: i
   17 root       -16    -     0K    16K -       3   0:08   1.12% rand_harvestq
39298 unbound     20    0 72916K 31596K kqread  3   0:01   1.09% unbound{unboun
   12 root       -92    -     0K   496K WAIT    3   0:02   0.61% intr{irq259: i
   11 root       155 ki31     0K    64K RUN     0 912:29   0.52% idle{idle: cpu
   12 root       -72    -     0K   496K WAIT    2   0:02   0.35% intr{swi1: pfs
    0 root       -92    -     0K   448K -       2   0:00   0.24% kernel{igb1 qu
   12 root       -76    -     0K   496K WAIT    3   0:03   0.15% intr{swi0: uar




-----------------------------

systat -vm 3


    1 users    Load  2.58  1.69  1.11                  Jul 27 08:59
   Mem usage:  21%Phy  1%Kmem
Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  129892   36820 12632092    39224 3175880  count
All  133660   40504 13715660    67628          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt 35953 total
             32       52k    2 5198  32k  926             cow       4 uart0 4
                                                          zfod      1 ehci0 18
25.9%Sys  18.7%Intr  2.1%User  0.0%Nice 53.3%Idle         ozfod       ahci0 19
|    |    |    |    |    |    |    |    |    |           %ozfod  1123 cpu0:timer
=============+++++++++>                                   daefr  1126 cpu1:timer
                                        29 dtbuf          prcfr  1127 cpu3:timer
Namei     Name-cache   Dir-cache    145989 desvn          totfr    84 cpu2:timer
   Calls    hits   %    hits   %     36007 numvn          react     1 igb0:que 0
      19      19 100                 14872 frevn          pdwak 13759 igb0:que 1
                                                       15 pdpgs     1 igb0:que 2
Disks  ada0 pass0                                         intrn     3 igb0:que 3
KB/t   0.00  0.00                                  624712 wire        igb0:link
tps       0     0                                   36984 act       1 igb1:que 0
MB/s   0.00  0.00                                  183780 inact 13514 igb1:que 1
%busy     0     0                                         laund     3 igb1:que 2
                                                  3175880 free   5206 igb1:que 3



-----------------------------

systat -ifstat 3


                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   ||||||||||||||

      Interface           Traffic               Peak                Total
            lo0  in      0.089 KB/s          0.982 KB/s            3.729 MB
                 out     0.089 KB/s          0.982 KB/s            3.729 MB

           igb1  in      1.184 MB/s          1.194 MB/s          603.486 MB
                 out    56.019 MB/s         56.498 MB/s           27.880 GB

           igb0  in     55.994 MB/s         56.525 MB/s           27.880 GB
                 out     1.183 MB/s          1.194 MB/s          603.794 MB



--------------------------------------------

vmstat -i 5

irq4: uart0                           60         12
irq18: ehci0                           4          1
irq19: ahci0                           0          0
cpu0:timer                          4949        989
cpu1:timer                          5623       1124
cpu3:timer                          5623       1124
cpu2:timer                          3845        769
irq256: igb0:que 0                     5          1
irq257: igb0:que 1                 70255      14045
irq258: igb0:que 2                     8          2
irq259: igb0:que 3                    19          4
irq260: igb0:link                      0          0
irq261: igb1:que 0                    10          2
irq262: igb1:que 1                 68832      13761
irq263: igb1:que 2                     5          1
irq264: igb1:que 3                 25967       5191
irq265: igb1:link                      0          0
Total                             185205      37026

---------------------------------------------------------------------------------------

Thanks for your help in advance

Regards,
Richard
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 01, 2018, 01:55:15 pm
Just to further add to the topic:

It seems when 2 parallel iperf streams are running, sometimes I get great results (approx. 800-850 Mbit/sec), and sometimes less than ok results (various values can happen like 300 or 500 or 600 Mbit/sec).

From "top -CHIPS" its clearly visible, that in bad scenarios, only 1 core is getting 100% utilized (100% interrupt), while the other 3 cores are idle at 99%.  During middle-performance case, 2 cores are 100% interrupt, 2 cores are 100% idle. If the best-case scenario happens (800-850 Mbit), 3 cores are nearly all 100% while 1 core is idle 100%. So there must be something happening in the proper load balancing of NIC queues--> CPU cores.

Just by simply re-run the same iperf commandline, I get all these various test results. The values are quite solid during the same session run, but after iperf session completion, re-running the exact same command between the exact same 2 endpoint PC, I get such larg variations. Interrupt-load in top clearly confirms this.

Can anyone reproduce the same test cases, or confirm similar results?
Of course that still does not help the weak single-core performance.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 12:06:23 am
I used to have an APU2C4, and realised from looking around the web that others had the same problem. For example, see this article here (https://teklager.se/en/knowledge-base/apu2c0-ipfire-throughput-test-much-faster-pfsense/). They too seem to blame single-core routing but you have found that at times the cores are more evenly used. I have read that later versions of FreeBSD got better at SMP/multi-core routing but apparently not all the way there yet? Perhaps using several iperf3 sessions you are tying one session to a core, and thus getting better (parallel) throughput that way?

Edit: You may also wish to try these settings/tweaks (https://forum.opnsense.org/index.php?topic=6590.0). I didn't see them before I sold my APU2 and got a G4560 based box instead, but they could help. Report back your findings please.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 02, 2018, 05:32:05 am
I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 08:49:27 am
I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.

What's your hardware? The APU2 is a particular case, as it has a low single core speed (1GHz) and is an embedded low power SoC. For normal x86 hardware you'll be fine - I run 380Mbps down on a small form factor Pentium G4560 and it doesn't break a sweat. Gigabit is fine too.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: franco on August 02, 2018, 02:02:54 pm
I don't think it's practical to compare Linux and FreeBSD throughput and expect them to match. The latter will be lower.


Cheers,
Franco
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 02, 2018, 04:11:00 pm
My hardware is an APU2C4 :).

As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 04:16:22 pm
My hardware is an APU2C4 :).

As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.

Yes of course, but think of it another way. The APU2 is 'only' 1GHz per core. If OPNsense is only using a single core for routing, you've got 1GHz processing power to try to max your connection. Linux on the other hand is multi-core aware. So now you're using 4x 1GHz for routing your connection. No wonder the throughput is higher. Actually, as I said earlier FreeBSD is now getting much better with spreading load across cores, though it doesn't apply for every part of the 'networking' process. FreeBSD has probably the best networking stack in the world, or certainly one of them. It can route 10Gbps, 40Gbps, even 100Gbps on suitable hardware. Unfortunately, the APU2 isn't the most suitable hardware (for high throughput on *BSD).

If you need >500Mbps stick to Linux and you won't have an issue. If you want <500Mbps then *sense will be fine on your APU.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 02, 2018, 04:27:59 pm
Rainmaker,

I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.

But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 02, 2018, 04:39:44 pm
Rainmaker,

I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.

But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.

Ah, you (respectfully) are a lot more knowledgeable than I catered for in my response. Apologies, it's difficult to pitch your responses on the Internet; especially when you and the other people don't know each other yet (as I'm sure you know).

Yes, FreeBSD's pf is indeed much more SMP capable. Last week I took both OpenBSD and FreeBSD installs and 'made' routers out of them, before comparing them side-by-side. Even on an 8700k at 5GHz per core OpenBSD was less performant than FreeBSD. However there are many other factors, as we both touched upon in previous posts.

NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:

Code: [Select]
/boot/loader.conf.local
 
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1

Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 03, 2018, 03:07:06 pm

NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:

Code: [Select]
/boot/loader.conf.local
 
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1

Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.

Hi rainmaker,

The license ack has nothing to do with igb driver (imho). This ist related to Intel PRO/Wireless adapters.
(https://www.freebsd.org/cgi/man.cgi?iwi)

regards pylox
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Rainmaker on August 03, 2018, 03:11:59 pm
Hi pylox,

Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.

My apologies for repeating it here, and thanks for the lesson.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 03, 2018, 03:25:22 pm
Hi pylox,

Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.

My apologies for repeating it here, and thanks for the lesson.

Hi rainmaker,

no problem... Some time before i did the same entry in loader.conf.local... ;D After some research i realize it's bullshit...
I think the OP's problem is something special PPPoE related topic. Normally there should no problems with performance on APU2.

regards pylox
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 06, 2018, 02:18:30 pm
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.

All I could see, that performance of single-flow iperf is constantly maxing at around 450 Mbit/sec (the direction is FROM wan TO lan). FROM lan TO wan seems slightly higher, about approx. 600-650ish Mbit/sec.

Multi-flow iperf: now here comes interesting things. The result varies from run to run, e.g. I run a 2-flow iperf session, that takes 60 seconds. It finishes, I immediately re-start with the same command, and I get a totally different result. Then after 60 seconds, repeat, yet another completely different result, in terms of throughput.

With 2-flow iperf, sometimes I can reach 850-900 Mbit, other times only as low as 250 Mbit. Yep, quite gigantic difference, even though all relevant test parameters unchanged.

When I get 850-900 Mbit throughput, the 2 flows are evenly distributed (450Mbit+450Mbit flow = Total 900 Mbit), and CPU interrupt usage is around 270%-280% (explanation: total CPU processing power is 400% = 100% per CPU core times 4 cores).

When I get 600 Mbit, usually I see 1 flow with 580Mbit, and another flow with 1-2 or max 10Mbit. Interrupt is approx. 170-180%. When I get 200-300Mbit, I get sometimes 2x 150 Mbit, other times 1x190+1x2-3Mbit flows, and only 100% interrupt usage (on 1 single core). And these vary from run to run.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 07, 2018, 07:55:27 pm
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
......

Hi ricsip,

this ist very hard to find. Unfornatunatly i did not have a test setup with a APU2 (and not much time).
But you can try different things:

1. Change this tunables and measure...
vm.pmap.pti="0"  #(disable meltdown patch - this is an AMD processor)
hw.ibrs_disable="1" #(disable spectre patch temporarily)

2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0  #(x = number of interface)

3. Change the network interface interrupt rate and measure
hw.igb.max_interrupt_rate="16000" #(start with 16000, can increased up to 64000)

4. Disable Energy Efficiency for each interface an measure
dev.igb.<x>.eee_disabled="1" #(x = number of interface)

Should be enough for the first time...;-)

regards pylox
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 09, 2018, 02:32:27 pm
Will try to see if any of these make a difference. But in general I am very skeptic that it wont, and as nobody from the forum owners replied anything meaningful since this thread started :(
(apart from basically saying its not practical to compare BSD and Linux)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 09, 2018, 04:08:24 pm
Maybe the forum owners dont use APU?

Have you followed the interrupt stuff from:
https://wiki.freebsd.org/NetworkPerformanceTuning


How many queues does you NIC have? Perhaps you can lower the number of queues on the NIC if single stream is so important for you, but then I'd guess all other traffic will be starving ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: pylox on August 09, 2018, 04:12:45 pm
Will try to see if any of these make a difference. But in general I am very skeptic that it wont, and as nobody from the forum owners replied anything meaningful since this thread started :(
(apart from basically saying its not practical to compare BSD and Linux)

Hi ricsip,

be aware about there a lot of circumstances (especially with hardware, or your test-setup) things will not work in an optimal way... There is no "silver bullet" - so complaining will not help. Also possible other users of OPNSense & APU2 do not have a requirement of one near full 1Gbit flow. From my perspective you have three choices: try to use a stronger hardware, use another software or do some more testing and let participate the community...

regards pylox

 
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 09, 2018, 04:16:00 pm
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 13, 2018, 01:44:44 am
If these suggestions improve performance, I'd love to hear about it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 13, 2018, 10:57:06 am
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)

Testing is in progress, but at the moment I am overloaded with my other tasks. Just wanted to let you know I didnt abandon the thread. As my goal is to get this fixed, I will post the results in the next couple of days here anyway.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 15, 2018, 11:01:04 am
Hello pylox, all

just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
......

Hi ricsip,

this ist very hard to find. Unfornatunatly i did not have a test setup with a APU2 (and not much time).
But you can try different things:

1. Change this tunables and measure...
vm.pmap.pti="0"  #(disable meltdown patch - this is an AMD processor)
hw.ibrs_disable="1" #(disable spectre patch temporarily)

2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0  #(x = number of interface)

3. Change the network interface interrupt rate and measure
hw.igb.max_interrupt_rate="16000" #(start with 16000, can increased up to 64000)

4. Disable Energy Efficiency for each interface an measure
dev.igb.<x>.eee_disabled="1" #(x = number of interface)

Should be enough for the first time...;-)

regards pylox

Ok, I did all the steps above. No improvement, still wildly sporadic measurements/results after each test-execution.

Only difference, that the CPU load characteristics went from 99% SYS + 60-70% IRQ --> 100+ 60-70% IRQ (SYS dropped to 1-2%).

Note1: only tried hw.igb.max_interrupt_rate= "8000" --> "16000" not any higher.
Note2: 2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0  #(x = number of interface)  --> TYPO, its actually dev.igb.<x>.fc=0
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on August 15, 2018, 11:55:19 am
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)

I have also went through this. No measurable improvement in throughput.

machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and  sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.

hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.

hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1"  --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did

I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 15, 2018, 05:16:08 pm
Sorry .. we are all no magicians.  ::)

You can go for commercial vendors like Cisco where you are limited to 85mbit and have to purchase a extra license.

Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 15, 2018, 06:14:49 pm
Well that's disappointing. OPNsense is a great piece of software. Maybe I'll check back in when FreeBSD 12 is released as I think this is overall a better solution for my needs than ipfire.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 15, 2018, 06:21:03 pm
When you send me such a device I can do some testing. No other Idea how to help
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 15, 2018, 06:46:26 pm
I'm willing to chip in to buy the OPNsense project an APU2.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on August 15, 2018, 08:28:34 pm
Can also be a used one .. I dont need it for long.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: KantFreeze on August 21, 2018, 04:37:21 pm
Looks like I'm the only one willing to chip in?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 04, 2018, 01:18:04 pm
Looks like I'm the only one willing to chip in?

@KantFreeze:
Lets be reasonable. Nobody will send equipment as a compliment to unknown people on the internet. At least that is my view.

@mimugmail: how about a donation towards you, so you can buy a brand new APU2 for yourself, and you could spend some valuable time to see its max. performance capabilities, and document your findings? No need to return the device at the end, you should keep it for future Opnsense release benchmarks / regression tests.

I bought my APU2 from a local reseller (motherboard + black case + external PSU + a 16Gb mSATA SSD), sum was approx. 200 EUR. If there are 10 real volunteers, I am willing to spend 20 EUR (non-refundable) "donation" on this project.

DM me for the details, if you are interested.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 04, 2018, 01:51:02 pm
local = German? I can ask my boss is the company is willing to test such a device ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 04, 2018, 02:33:13 pm
local = German? I can ask my boss is the company is willing to test such a device ..

I am not from Germany, I live in eastern Europe, just converted my local currency to EUR for an approximate estimation. But your local PC shop may sell these devices even cheaper:

http://pcengines.ch/order.htm
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: marjohn56 on September 04, 2018, 03:02:04 pm
local = German? I can ask my boss is the company is willing to test such a device ..


@mimugmail: I have a spare APU2 I no longer use, if you send me your bank account details, pass-codes etc.. that will do as security.


PM me we'll work something out. : :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 04, 2018, 03:09:47 pm
You want to send it to me AND want my bank details?? :P
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 04, 2018, 03:11:54 pm
I ordered this via company, no tax, so only 160EUR

https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: marjohn56 on September 04, 2018, 03:30:56 pm
Cool... OK.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 04, 2018, 04:02:28 pm
I ordered this via company, no tax, so only 160EUR

https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M

I really meant to support this evaluation effort. So if there is still something needed, let us know!
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 12:44:24 pm
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)

I have also went through this. No measurable improvement in throughput.

machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and  sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.

hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.

hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1"  --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did

I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.

Some small addendum:
recently I noticed (maybe when upgraded to 18.7.1_3, but TBH not sure), that sometimes (depends on the actual throughput / interrupt load shared among cores) the serial-console hangs during iperf. As soon as the iperf session is finished or I interrupt the session manually, serial-console becomes live again. Noticed during running "top" on console: I noticed refresh stopped / frozen during the iperf session, keyboard wasnt working either while the iperf traffic happened. As soon the iperf session finished, "top" continued to produce output / console responds to keystrokes.

Seems has to do something with the fact when throughput is alternating between those 2-3 discrete levels randomly among iperf sessions.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 12:49:14 pm
Do you run iperf on the Firewall itself?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 12:59:54 pm
No, never!

The 2 iperf endpoints are running on a PC connected to LAN (igb1) and another PC connected to WAN (igb0), the APU is always just a transit device (packet forwarding / packet filtering  / NAT translation between igb1 and igb0 and vice versa), never terminating any iperf traffic directly on it.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 02:40:33 pm
Next week I should get my device and will put it in my lab. Lets see ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 05:56:51 pm
Yet another small addendum:

finally I managed to test throughput over pppoe, under real life conditions.

Results are quite weak:
approx. 250-270 Mbit/sec (WAN-->LAN traffic direction) was achieved with the APU2. Not iperf this time, but tested with some torrent (so nobody can tell that I was pushing for unrealistic expectations over 1 single flow).
Again, the router was only a transit device, the torrent client was running on a PC behind the APU. SSD wasnt the bottleneck during download.

As a comparison, using a different vendor router, I was able to achieve 580-600 Mbit/sec easily downloading the same test torrent. Didnt investigate if it could go higher or not with this different vendor router, but thats still more than double performance difference.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 05:59:15 pm
You mean IPFire on the same hardware?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 06:11:21 pm
You mean IPFire on the same hardware?

No, not ipfire. Sorry if I was unclear :)

I installed a competely different equipment ( Asus AC66U B1 router) just for comparison to see if that router can reach the wirespeed gigabit.

On the APU I could not test ipfire today due to not enough time, but maybe in the coming days I will do another round of tests using the ipfire.

Need to find a timeslot when no users are using the internet :(
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 07, 2018, 06:20:39 pm
If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 07, 2018, 06:53:12 pm
If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.

Yes, you are right! Some weeks ago, I did run the ipfire distrib on the APU. But that was only in an isolated LAN, without access to pppoe or to the internet. So I could run my iperf benchmarks without breaking the production internet

Today, unfortunately I wasted a lot of time to make the opensense work on my production pppoe internet connection. Basically the Default gateway was not activated properly after the pppoe session came up, so any internet traffic failed with TTL expired error.

My existing config in opnsense I was using static ip for the WAN (remember, I used an isolated LAN earlier for iperf testing). Today I changed the WAN config from static IP to pppoe. But some previous static Lan def gw config was stuck, and wasnt deleted properly (actually dmesg log complained about 2 gateways failed to remove). I logged into console, and tried couple of times  to reset the interface assignment and re-do the ip addressing, then logged into GUI and switched from WAN static IP to pppoe. The CLI console does not allow me to perform advanced config, like pppoe setup, so I had to perform that from GUI.
But it was still broken. I got ppoe session up, and I recieved public IP from my ISP, but the default gw was still the IP of my oldconfig LAN IP.

That is when I decided to login to consoleagain, select Option 4) factory reset, and re-did the Initial setup wizard from scratch on the GUI. I selected WAN type:pppoe, and this way I succeeded. But it wasted half of my day.

https://github.com/opnsense/core/issues/2186
I found this bugreport about pppoe default gateway not updating after the pppoe session activates, but it looked like that bug was fixed in 18.1.9 or so. Seems i was hitting something similar, dont really know.

So basically I did not have time to switch the operating system, boot ipfire, and repeat the same tests under linux OS. Planning to do it in the next coming days.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 08, 2018, 04:56:14 pm
Well, I did test the IPFIRE as well on APU2 (I used latest ipfire-2.21-core123).

I could only achieve the same 250-290 Mbit/sec for the same torrent, as yesterday with the opnsense. Because I was suspicious, I also tried to connect my laptop directly to my ISP (I set up the pppoe profile directly on my PC), and tried it without any middle-router: speed was the same 250-280 Mbit/sec this time. So I think there is a problem with my bloody ISP today (yesterday I managed to get 600 Mbit so there must be something going on today). There is no point continuing this testing until I can figure out what the hell is happening.

If anyone can share with me the simplest PPPOE simulator config, based on Freebsd or Linux, I am going to try that on a powerful PC connecting to my APU, and completely rule out the uncertain ISP from this equation for these tests (I would run the IPERF on the PPPOE-simulator PC itself, being the WAN-endpoint for IPERF). Torrent would be difficult to simulate in such topology, so have to revert to iperf first.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 13, 2018, 01:15:38 pm
Me again.

I did some further testing. No PPPoE involved (dont have access to the internet line at the moment, only performed pure IP<-->IP in my lab, opnsense is only in the transit, not running iperf itself).

Found the option in the menu, where I can literally turn off the firewall (disable all packet filtering), which also disables NAT, and turns opnsense into a plain routing box.

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).

Providing iperf -P 4 gives also very variable results:
- sometimes 1,2 or even 3 sessions are 0Kbit/sec, while the 4th session achieves the maximum throughput that was measure with single flow (-P 1)
- other times 1 flow has double throughput than the other 3 (unbalanced)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 27, 2018, 12:38:16 pm
Next week I should get my device and will put it in my lab. Lets see ..

Hello mimugmail,
did you have a chance to look at the perf of the his box?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on September 27, 2018, 12:52:23 pm
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on September 27, 2018, 01:00:07 pm
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/

No problem, take your time and have fun! Hope you can find some clever solution, I am mostly stuck since some time.

Note: be careful what BIOS version you flash! Check these links to be in picture:
https://pcengines.github.io
https://github.com/pcengines/coreboot/issues/196
http://www.pcengines.info/forums/?page=post&id=4C472C95-E846-42BF-BC41-43D1C54DFBEA&fid=6D8DBBA4-9D40-4C87-B471-80CB5D9BD945
http://pcengines.ch/howto.htm#bios

Yes, its kinda mess how unorganized the docs are for this company.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 07, 2018, 01:50:01 pm
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/

Hello, did you manage to check it?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 07, 2018, 02:58:53 pm
My apprentice set it up last week, did some BIOS Updates, will start tomorrow :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 07, 2018, 03:58:25 pm
My apprentice set it up last week, did some BIOS Updates, will start tomorrow :)

Thanks, I'm really curious to see your results!
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 09, 2018, 01:35:15 pm

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).



I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 09, 2018, 08:04:36 pm
Ok, tried all available tuning stuff, single stream download in NAT environment is only 440mbit. I'll a vanilla FreeBSD on thursday ...
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 11, 2018, 12:49:00 pm
I'm not able to install FBSD 11.1 since it always hangs on boot at some acpi stuff. Also happened on OPNsense 18.7 and after around 20 restarts, new install and reverting config it worked again.
11.2 also not possible to install .. don't have the time now.

I have not idea if my device is bricked or sth. but it's way far away from stable .. and only serial is a mess  ::)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 11, 2018, 01:50:37 pm
Quick question: can you tell me
1) what BIOS is running on the board (should be the first thing visible on the serial output if powered on)
2) What storage have you added to the board? Are you trying to boot from SD card or from internal mSATA or something else?

Ps. I managed to run Freebsd 11.2 from a USB drive in Live mode, did not install it to the internal mSATA hard drive.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 12, 2018, 10:43:22 am
I'm on 4.0.19. Live CD is a good idea .. I can try this next week.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 12, 2018, 10:47:03 am
I'm on 4.0.19. Live CD is a good idea .. I can try this next week.

Ok.
By the way, better to use firmware 4.0.18, because 19 has some new boot issue, that has been found recently, and its a big mistery when will pcengines fix it in 4.0.20.

Update: actually they released it already:
https://pcengines.github.io/#lr-12
There seems to be a related fix: "pfSense 2.4.x fails to boot when no USB stick is plugged"
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 17, 2018, 10:47:39 am

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).



I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.

Similar results with vanilla 11.1, now upgrading to 11.2
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: miroco on October 17, 2018, 10:08:31 pm
ECC is fixed on the APU-platform effective 2018-10-04 BIOS v4.8.0.5 Mainline release.

https://pcengines.github.io

https://3mdeb.com/firmware/enabling-ecc-on-pc-engines-platforms/#.W8eUoKeHKuM
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: miroco on October 18, 2018, 10:40:21 am
An unintentional dubble post.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 18, 2018, 11:21:37 am

Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)

Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).



I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.

Similar results with vanilla 11.1, now upgrading to 11.2

Same with 11.2. I'll now install OPNsense on a similar hardware to see if it's related to the hardware ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 18, 2018, 12:44:11 pm
Thanks for the constant status updates :) Eagerly waiting for your results.

By the way: pls dont forget that there is a current known issue in coreboot 4.8.x regarding CPU downlclocking:
https://github.com/pcengines/coreboot/issues/196

so make sure the poor performance is not because the APU lowers the clockrate after couple of minutes uptime to  @600 Mhz , instead of 1Ghz :)

Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 18, 2018, 01:46:15 pm
But I'm running 4.0.18?


I tested some old Sophos UTM with Atom N540 processor and got in all directions with 1 or 10 streams only 500-600Mbit. I'm searching for a device quite comparable to the APU :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 19, 2018, 07:37:07 pm
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232451
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: ruffy91 on October 20, 2018, 07:19:18 am
i210 has software configurable flow control. Maybe the configuration is not that good?
Registers are:
The  following  registers are defined for  the implementation of  flow control:
 • CTRL.RFCE  field is  used  to enable  reception of legacy flow  control packets and reaction to  them
 • CTRL.TFCE  field is  used  to  enable transmission of  legacy flow  control packets
 • Flow Control Address Low,  High  (FCAL/H) - 6-byte flow  control multicast address
 • Flow Control Type  (FCT) 16-bit field to indicate flow control  type
 • Flow Control bits in Device  Control  (CTRL) register  - Enables flow  control modes
 • Discard PAUSE  Frames (DPF)  and  Pass MAC Control Frames (PMCF) in RCTL  - controls  the forwarding of control  packets to the  host
 • Flow Control Receive Threshold High (FCRTH0)  - A  13-bit high  watermark indicating receive buffer fullness. A  single  watermark  is  used in  link FC  mode.
 • DMA Coalescing  Receive Threshold  High (FCRTC) -  A 13-bit high  watermark indicating receive buffer fullness when in  DMA coalescing and Tx  buffer  is empty.  The  value in  this  register can be higher than  value placed in  the FCRTH0  register  since  the watermark needs to be set  to allow for only receiving a  maximum sized Rx packet before  XOFF flow  control  takes effect and  reception is stopped (refer to  Table 3-28  for  information on  flow  control  threshold calculation).
 • Flow Control Receive Threshold Low (FCRTL0) - A  13-bit low watermark indicating receive buffer emptiness. A single watermark is used  in link FC mode.
• Flow Control Transmit Timer Value  (FCTTV) -  a set  of 16-bit timer values to include in  transmitted PAUSE  frame.  A single  timer is  used  in Link  FC mode
 • Flow Control Refresh  Threshold Value (FCRTV) - 16-bit PAUSE refresh threshold  value
 • RXPBSIZE.Rxpbsize  field is  used  to control  the size of the receive packet  buffer

The datasheet has very detailed descriptions on how flow control works: https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 20, 2018, 11:35:01 am
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232451

Do you think its a flow-control related bug?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 20, 2018, 12:03:34 pm
No idea, it sounds familiar ...
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 22, 2018, 08:41:40 am
Played with FC, tried again mixing setup with TSO, LRO, XCSUM .. always same result.
Found this one:

https://elatov.github.io/2017/04/pfsense-on-netgate-apu4-1gb-testing/

Dont have any other ideas now ..
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 22, 2018, 11:07:09 am
I tried a test kernel from franco which might come with 19.1 and gained a slightly better rate from 480mbit to 510mbit .. ok, last test for today :)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 22, 2018, 01:30:33 pm
I think such small difference cam easily be the random variation between test runs. I could see similar variations myself running on the same OS.

Anyway, thanks for your support, at least I know its not just me. Practically all Pcengines APU2 owners should consider something different for 1Gbit WAN. If opnsense will be installed on the board of course. :-)
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: mimugmail on October 22, 2018, 01:34:43 pm

Anyway, thanks for your support, at least I know its not just me. Practically all Pcengines APU2 owners should consider something different for 1Gbit WAN. If opnsense will be installed on the board of course. :-)

Why? It achieves 1GB with multiple streams easily .. why would someone need 1GB on 1 stream?
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: Ricardo on October 22, 2018, 02:02:23 pm
Do you have any chance to access PPPOE-based WAN / PPPOE-based WAN simulator? As I also have issues to reach 1 Gbit even on multi-stream, if PPPOE is used for the WAN Aconnection. I already gave up hope for 1Gbit single-flow performance, but even multi-flow performance is quite low. Where connecting a PC to the same PPPOE WAN directly (no OPNSENSE router/firewall in front of the PC), I can achieve much higher speeds.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: patcsy88 on December 27, 2018, 04:25:21 am
Hi, I have been following this thread and other related forums re: achieving 1GBit via PPPoE with PCEngines' APU2. net.isr.dispatch = "deferred" yielded only a small speed improvement - from 400Mbps to 450Mbps. Using the ISP-provided DIR-842, I can hit up to 800+Mbps. I am on the latest OPNsense with the stock kernel. PFSense on the same APU2 and net.isr.dispatch = "deferred" yielded 520-550Mbps.
Title: Re: PC Engines APU2 1Gbit traffic not achievable
Post by: andbaum on January 21, 2019, 03:04:48 pm
I have an APU2 board with OPNsense as well. My board only achieves about 120 MBit/s per NIC in iPerf  >:(
I posted the problem here: https://forum.opnsense.org/index.php?topic=11228.0