Dear Opnsense team,
I am facing significant performance issue using Opnsense 18.1.x
Hardware: PC Engines APU2C4, 3x i210AT NIC / AMD GX-412TC CPU / 4 GB DRAM
Issue:
this HW cannot handle 1 Gigabit wire-speed via single-flow network traffic if using Opnsense. Maximum I could get is approx. 450 Mbit (WAN --> LAN direction). There are no custom firewall rules / IDS / IPS / etc. apart from the factory default state after a clean install (I used the serial-installer of 18.1.6rev2, then upgraded all up to 18.1.13 if that counts).
However:
the exact same HW can easily do 850-900+ Mbit/sec single-flow traffic if using a Linux firewall distrib (I used the latest IPFire 2.19 - Core Update 120) and observing much less load during this traffic compared with the load observed in opnsense.
Iperf3 single-flow performance was used to measure throughput, using IP protocol, and NAT. No IMIX stress-test before you ask, on contrary, the biggest possible MTU (1500) and MSS size (1460) was set .
My real concern is about the performance drop, if I enable PPPoE (my ISP connects through PPPoE): as google revealed many "single-thread pppoe speed penalty" topics, that is what started my whole descend into this topic. But as I have bad routing performance using a very ideal setup of purely IP type, I expect PPPoE to be much worse (by definition, it can just be worse after all).
Checking on Freebsd net-mail list about possible solutions/workarounds quickly revealed, that Opnsense is not running Freebsd, but a fork of it (HardenedBSD). So Freebsd support for opnsense is practically non-existent. Or at least everybody keeps pointing fingers to the other kind of situation. I saw several times in this forum, that you refer to Freebsd forums if someone hit a bug that is not considered as Opnsense bug, but rather bug of the underlying OS. Reading that the relationship between the Freebsd team and Hardenesbsd team is far from friendly, I wonder what kind of help one can expect if issue with the OS is found?
The thread started here:
Bug 203856 - [igb] PPPoE RX traffic is limitied to one queue -->
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856
then continued here:
https://lists.freebsd.org/pipermail/freebsd-net/2018-July/051197.html
And that is the point, where I am stuck.
In short:
- I tried all the valid settings / tuning seen here:
https://bsdrp.net/documentation/technical_docs/performance#nic_drivers_tuning --> specifics for APU2+igb
- tried "net.isr.maxthreads" and "net.isr.numthreads" greater than 1 and switch net.isr.dispatch to "deferred" --> no measurable improvement in performance, but the load nearly doubled on the APU2
I have collected various performance data during traffic, if that helps to troubleshoot where the bottleneck is in this opnsense system.
-------------------------------------------------------------------------------------------------------------------------------
Opnsense 18.1.13
OS: FreeBSD 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11 116e406d37f(stable/18.1) amd64
kldstat:
Id Refs Address Size Name
1 91 0xffffffff80200000 213bb20 kernel
2 1 0xffffffff8233d000 6e18 if_gre.ko
3 1 0xffffffff82344000 7570 if_tap.ko
4 3 0xffffffff8234c000 54e78 pf.ko
5 1 0xffffffff823a1000 e480 carp.ko
6 1 0xffffffff823b0000 e3e0 if_bridge.ko
7 2 0xffffffff823bf000 6fd0 bridgestp.ko
8 1 0xffffffff823c6000 126a8 if_lagg.ko
9 1 0xffffffff823d9000 1610 ng_UI.ko
10 31 0xffffffff823db000 173e0 netgraph.ko
11 1 0xffffffff823f3000 3620 ng_async.ko
12 1 0xffffffff823f7000 4fb8 ng_bpf.ko
13 1 0xffffffff823fc000 4e98 ng_bridge.ko
14 1 0xffffffff82401000 31e0 ng_cisco.ko
15 1 0xffffffff82405000 f20 ng_echo.ko
16 1 0xffffffff82406000 38b8 ng_eiface.ko
17 1 0xffffffff8240a000 4870 ng_ether.ko
18 1 0xffffffff8240f000 1db0 ng_frame_relay.ko
19 1 0xffffffff82411000 17e8 ng_hole.ko
20 1 0xffffffff82413000 4250 ng_iface.ko
21 1 0xffffffff82418000 6250 ng_ksocket.ko
22 1 0xffffffff8241f000 7d88 ng_l2tp.ko
23 1 0xffffffff82427000 3fe0 ng_lmi.ko
24 1 0xffffffff8242b000 65c8 ng_mppc.ko
25 2 0xffffffff82432000 b48 rc4.ko
26 1 0xffffffff82433000 2ad8 ng_one2many.ko
27 1 0xffffffff82436000 a3e0 ng_ppp.ko
28 1 0xffffffff82441000 8f08 ng_pppoe.ko
29 1 0xffffffff8244a000 5f68 ng_pptpgre.ko
30 1 0xffffffff82450000 2570 ng_rfc1490.ko
31 1 0xffffffff82453000 6288 ng_socket.ko
32 1 0xffffffff8245a000 21a0 ng_tee.ko
33 1 0xffffffff8245d000 2ec0 ng_tty.ko
34 1 0xffffffff82460000 45b8 ng_vjc.ko
35 1 0xffffffff82465000 2f20 ng_vlan.ko
36 1 0xffffffff82468000 31f0 if_enc.ko
37 1 0xffffffff8246c000 28b0 pflog.ko
38 1 0xffffffff8246f000 d578 pfsync.ko
39 1 0xffffffff8247d000 3370 ng_car.ko
40 1 0xffffffff82481000 36a8 ng_deflate.ko
41 1 0xffffffff82485000 4ef8 ng_pipe.ko
42 1 0xffffffff8248a000 3658 ng_pred1.ko
43 1 0xffffffff8248e000 2058 ng_tcpmss.ko
44 1 0xffffffff82621000 7130 aesni.ko
45 1 0xffffffff82629000 1055 amdtemp.ko
The 2 PC I use to generate traffic are 2x Win7 boxes,
PC-A connects directly to igb0 (WAN endpoint), IP addr. 192.168.1.2
PC-B connects directly to igb1 (LAN endpoint), IP addr. 10.0.0.100
I run:
(on the PC-A) iperf3 -s
(on the PC-B) iperf3 -c 192.168.1.2 -t 300 -P 1 -R (-R to simulate traffic direction FROM Wan TO Lan, after PC-B made initial connection TO PC-A)
---------------------------------------------------------------------------------------------------------------------------------------------
loader.conf:
##############################################################
# This file was auto-generated using the rc.loader facility. #
# In order to deploy a custom change to this installation, #
# please use /boot/loader.conf.local as it is not rewritten. #
##############################################################
loader_brand="opnsense"
loader_logo="hourglass"
loader_menu_title=""
autoboot_delay="3"
hw.usb.no_pf="1"
# see https://forum.opnsense.org/index.php?topic=6366.0
hint.ahci.0.msi="0"
hint.ahci.1.msi="0"
# Vital modules that are not in FreeBSD's GENERIC
# configuration will be loaded on boot, which makes
# races with individual module's settings impossible.
carp_load="YES"
if_bridge_load="YES"
if_enc_load="YES"
if_gif_load="YES"
if_gre_load="YES"
if_lagg_load="YES"
if_tap_load="YES"
if_tun_load="YES"
if_vlan_load="YES"
pf_load="YES"
pflog_load="YES"
pfsync_load="YES"
# The netgraph(4) framework is loaded here
# for backwards compat for when the kernel
# had these compiled in, not as modules. This
# list needs further pruning and probing.
netgraph_load="YES"
ng_UI_load="YES"
ng_async_load="YES"
ng_bpf_load="YES"
ng_bridge_load="YES"
ng_car_load="YES"
ng_cisco_load="YES"
ng_deflate_load="YES"
ng_echo_load="YES"
ng_eiface_load="YES"
ng_ether_load="YES"
ng_frame_relay_load="YES"
ng_hole_load="YES"
ng_iface_load="YES"
ng_ksocket_load="YES"
ng_l2tp_load="YES"
ng_lmi_load="YES"
ng_mppc_load="YES"
ng_one2many_load="YES"
ng_pipe_load="YES"
ng_ppp_load="YES"
ng_pppoe_load="YES"
ng_pptpgre_load="YES"
ng_pred1_load="YES"
ng_rfc1490_load="YES"
ng_socket_load="YES"
ng_tcpmss_load="YES"
ng_tee_load="YES"
ng_tty_load="YES"
ng_vjc_load="YES"
ng_vlan_load="YES"
# dynamically generated tunables settings follow
net.enc.in.ipsec_bpf_mask="2"
net.enc.in.ipsec_filter_mask="2"
net.enc.out.ipsec_bpf_mask="1"
net.enc.out.ipsec_filter_mask="1"
debug.pfftpproxy="0"
vfs.read_max="32"
net.inet.ip.portrange.first="1024"
net.inet.tcp.blackhole="2"
net.inet.udp.blackhole="1"
net.inet.ip.random_id="1"
net.inet.ip.sourceroute="0"
net.inet.ip.accept_sourceroute="0"
net.inet.icmp.drop_redirect="0"
net.inet.icmp.log_redirect="0"
net.inet.tcp.drop_synfin="1"
net.inet.ip.redirect="1"
net.inet6.ip6.redirect="1"
net.inet6.ip6.use_tempaddr="0"
net.inet6.ip6.prefer_tempaddr="0"
net.inet.tcp.syncookies="1"
net.inet.tcp.recvspace="65228"
net.inet.tcp.sendspace="65228"
net.inet.tcp.delayed_ack="0"
net.inet.udp.maxdgram="57344"
net.link.bridge.pfil_onlyip="0"
net.link.bridge.pfil_local_phys="0"
net.link.bridge.pfil_member="1"
net.link.bridge.pfil_bridge="0"
net.link.tap.user_open="1"
kern.randompid="347"
net.inet.ip.intr_queue_maxlen="1000"
hw.syscons.kbd_reboot="0"
net.inet.tcp.log_debug="0"
net.inet.icmp.icmplim="0"
net.inet.tcp.tso="1"
net.inet.udp.checksum="1"
kern.ipc.maxsockbuf="4262144"
vm.pmap.pti="1"
hw.ibrs_disable="0"
# dynamically generated console settings follow
comconsole_speed="115200"
#boot_multicons
boot_serial="YES"
#kern.vty
console="comconsole"
---------------------------------------------
loader.conf.local
# I have commented everything out, (did reboot to apply) to start performance tuning from scratch
#kern.random.harvest.mask=351
#hw.igb.rx_process_limit=-1
#net.link.ifqmaxlen=2048
#net.isr.numthreads=4
#net.isr.maxthreads=4
#net.isr.dispatch=deferred
#net.isr.bindthreads=1
------------------------------------------------
sysctl.conf is practically empty
------------------------------------------------
ifconfig:
Note: igb0 is "WAN", igb1 is "LAN"
Note2: no PPPoE so far!
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
ether 00:0d:b9:4b:0b:5c
hwaddr 00:0d:b9:4b:0b:5c
inet6 fe80::20d:b9ff:fe4b:b5c%igb0 prefixlen 64 scopeid 0x1
inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
ether 00:0d:b9:4b:0b:5d
hwaddr 00:0d:b9:4b:0b:5d
inet6 fe80::20d:b9ff:fe4b:b5d%igb1 prefixlen 64 scopeid 0x2
inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:0d:b9:4b:0b:5e
hwaddr 00:0d:b9:4b:0b:5e
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
enc0: flags=0<> metric 0 mtu 1536
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: enc
pflog0: flags=100<PROMISC> metric 0 mtu 33160
groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
groups: pfsync
syncpeer: 0.0.0.0 maxupd: 128 defer: off
--------------------------------------------------------------
top -SHPI
last pid: 90572; load averages: 2.13, 1.48, 1.01 up 0+15:54:28 08:58:36
136 processes: 8 running, 99 sleeping, 29 waiting
CPU 0: 0.0% user, 0.0% nice, 99.1% system, 0.0% interrupt, 0.9% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 67.1% interrupt, 32.9% idle
CPU 2: 0.3% user, 0.0% nice, 0.8% system, 0.2% interrupt, 98.7% idle
CPU 3: 0.2% user, 0.0% nice, 1.9% system, 6.8% interrupt, 91.2% idle
Mem: 36M Active, 179M Inact, 610M Wired, 387M Buf, 3102M Free
Swap:
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -92 - 0K 448K CPU0 0 1:32 99.37% kernel{igb0 qu
11 root 155 ki31 0K 64K CPU2 2 904:01 98.85% idle{idle: cpu
11 root 155 ki31 0K 64K RUN 3 909:09 93.95% idle{idle: cpu
12 root -92 - 0K 496K CPU1 1 1:54 50.64% intr{irq262: i
11 root 155 ki31 0K 64K CPU1 1 906:22 39.25% idle{idle: cpu
12 root -92 - 0K 496K WAIT 1 0:26 10.09% intr{irq257: i
12 root -92 - 0K 496K WAIT 3 0:03 3.19% intr{irq264: i
17 root -16 - 0K 16K - 3 0:08 1.12% rand_harvestq
39298 unbound 20 0 72916K 31596K kqread 3 0:01 1.09% unbound{unboun
12 root -92 - 0K 496K WAIT 3 0:02 0.61% intr{irq259: i
11 root 155 ki31 0K 64K RUN 0 912:29 0.52% idle{idle: cpu
12 root -72 - 0K 496K WAIT 2 0:02 0.35% intr{swi1: pfs
0 root -92 - 0K 448K - 2 0:00 0.24% kernel{igb1 qu
12 root -76 - 0K 496K WAIT 3 0:03 0.15% intr{swi0: uar
-----------------------------
systat -vm 3
1 users Load 2.58 1.69 1.11 Jul 27 08:59
Mem usage: 21%Phy 1%Kmem
Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 129892 36820 12632092 39224 3175880 count
All 133660 40504 13715660 67628 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt ioflt 35953 total
32 52k 2 5198 32k 926 cow 4 uart0 4
zfod 1 ehci0 18
25.9%Sys 18.7%Intr 2.1%User 0.0%Nice 53.3%Idle ozfod ahci0 19
| | | | | | | | | | %ozfod 1123 cpu0:timer
=============+++++++++> daefr 1126 cpu1:timer
29 dtbuf prcfr 1127 cpu3:timer
Namei Name-cache Dir-cache 145989 desvn totfr 84 cpu2:timer
Calls hits % hits % 36007 numvn react 1 igb0:que 0
19 19 100 14872 frevn pdwak 13759 igb0:que 1
15 pdpgs 1 igb0:que 2
Disks ada0 pass0 intrn 3 igb0:que 3
KB/t 0.00 0.00 624712 wire igb0:link
tps 0 0 36984 act 1 igb1:que 0
MB/s 0.00 0.00 183780 inact 13514 igb1:que 1
%busy 0 0 laund 3 igb1:que 2
3175880 free 5206 igb1:que 3
-----------------------------
systat -ifstat 3
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average ||||||||||||||
Interface Traffic Peak Total
lo0 in 0.089 KB/s 0.982 KB/s 3.729 MB
out 0.089 KB/s 0.982 KB/s 3.729 MB
igb1 in 1.184 MB/s 1.194 MB/s 603.486 MB
out 56.019 MB/s 56.498 MB/s 27.880 GB
igb0 in 55.994 MB/s 56.525 MB/s 27.880 GB
out 1.183 MB/s 1.194 MB/s 603.794 MB
--------------------------------------------
vmstat -i 5
irq4: uart0 60 12
irq18: ehci0 4 1
irq19: ahci0 0 0
cpu0:timer 4949 989
cpu1:timer 5623 1124
cpu3:timer 5623 1124
cpu2:timer 3845 769
irq256: igb0:que 0 5 1
irq257: igb0:que 1 70255 14045
irq258: igb0:que 2 8 2
irq259: igb0:que 3 19 4
irq260: igb0:link 0 0
irq261: igb1:que 0 10 2
irq262: igb1:que 1 68832 13761
irq263: igb1:que 2 5 1
irq264: igb1:que 3 25967 5191
irq265: igb1:link 0 0
Total 185205 37026
---------------------------------------------------------------------------------------
Thanks for your help in advance
Regards,
Richard
Just to further add to the topic:
It seems when 2 parallel iperf streams are running, sometimes I get great results (approx. 800-850 Mbit/sec), and sometimes less than ok results (various values can happen like 300 or 500 or 600 Mbit/sec).
From "top -CHIPS" its clearly visible, that in bad scenarios, only 1 core is getting 100% utilized (100% interrupt), while the other 3 cores are idle at 99%. During middle-performance case, 2 cores are 100% interrupt, 2 cores are 100% idle. If the best-case scenario happens (800-850 Mbit), 3 cores are nearly all 100% while 1 core is idle 100%. So there must be something happening in the proper load balancing of NIC queues--> CPU cores.
Just by simply re-run the same iperf commandline, I get all these various test results. The values are quite solid during the same session run, but after iperf session completion, re-running the exact same command between the exact same 2 endpoint PC, I get such larg variations. Interrupt-load in top clearly confirms this.
Can anyone reproduce the same test cases, or confirm similar results?
Of course that still does not help the weak single-core performance.
I used to have an APU2C4, and realised from looking around the web that others had the same problem. For example, see this article here (https://teklager.se/en/knowledge-base/apu2c0-ipfire-throughput-test-much-faster-pfsense/). They too seem to blame single-core routing but you have found that at times the cores are more evenly used. I have read that later versions of FreeBSD got better at SMP/multi-core routing but apparently not all the way there yet? Perhaps using several iperf3 sessions you are tying one session to a core, and thus getting better (parallel) throughput that way?
Edit: You may also wish to try these settings/tweaks (https://forum.opnsense.org/index.php?topic=6590.0). I didn't see them before I sold my APU2 and got a G4560 based box instead, but they could help. Report back your findings please.
I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.
Quote from: KantFreeze on August 02, 2018, 05:32:05 AM
I'm thinking of switching from ipfire to OPNsense because I think it has a better overall feature set, but this is my major hangup. If people are able to get similar performance out of OPNsense, I'd love to hear about it.
What's your hardware? The APU2 is a particular case, as it has a low single core speed (1GHz) and is an embedded low power SoC. For normal x86 hardware you'll be fine - I run 380Mbps down on a small form factor Pentium G4560 and it doesn't break a sweat. Gigabit is fine too.
I don't think it's practical to compare Linux and FreeBSD throughput and expect them to match. The latter will be lower.
Cheers,
Franco
My hardware is an APU2C4 :).
As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.
Quote from: KantFreeze on August 02, 2018, 04:11:00 PM
My hardware is an APU2C4 :).
As to linux v freebsd performance, obviously they are different kernels and aren't going to do everything the same. But, in this particular case the benchmarks have freebsd having roughly half the throughput of linux.
Yes of course, but think of it another way. The APU2 is 'only' 1GHz per core. If OPNsense is only using a single core for routing, you've got 1GHz processing power to try to max your connection. Linux on the other hand is multi-core aware. So now you're using 4x 1GHz for routing your connection. No wonder the throughput is higher. Actually, as I said earlier FreeBSD is now getting much better with spreading load across cores, though it doesn't apply for every part of the 'networking' process. FreeBSD has probably the best networking stack in the world, or certainly one of them. It can route 10Gbps, 40Gbps, even 100Gbps on suitable hardware. Unfortunately, the APU2 isn't the most suitable hardware (for high throughput on *BSD).
If you need >500Mbps stick to Linux and you won't have an issue. If you want <500Mbps then *sense will be fine on your APU.
Rainmaker,
I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.
But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.
Quote from: KantFreeze on August 02, 2018, 04:27:59 PM
Rainmaker,
I think I'm not communicating well. I'm not saying that FreeBSD has poor network performance. I'm saying that with the particular piece of hardware I happen to own FreeBSD has roughly half the throughput of linux and is struggles to use it efficiently. The FreeBSD development thread listed earlier suggests that it's not the SMP performance of the pf that's the issue, but something to do with some oddness in the embedded intel NIC.
But, most of these benchmarks are almost two years old. I'm wondering if at this point the problem with this particular hardware might be fixed and if people might be able to get similar performance under FreeBSD with tweaks.
Ah, you (respectfully) are a lot more knowledgeable than I catered for in my response. Apologies, it's difficult to pitch your responses on the Internet; especially when you and the other people don't know each other yet (as I'm sure you know).
Yes, FreeBSD's pf is indeed much more SMP capable. Last week I took both OpenBSD and FreeBSD installs and 'made' routers out of them, before comparing them side-by-side. Even on an 8700k at 5GHz per core OpenBSD was less performant than FreeBSD. However there are many other factors, as we both touched upon in previous posts.
NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:
/boot/loader.conf.local
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1
Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.
Quote from: Rainmaker on August 02, 2018, 04:39:44 PM
NIC queues are one factor, as you state. I'm not sure if OPNsense utilises multiple queues (i.e. per core) or whether it just uses one. For your APU specifically, did you accept the Intel proprietary licence? I can't quite recall whether the APU2 uses igb drivers? I don't even know if that applies to OPNsense, but I know on pfSense I was advised to create the following for an APU2:
/boot/loader.conf.local
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1
Then reboot. This apparently 'unlocks' some extra functionality in the NIC, which may improve your throughput. If you're running off an SSD don't forget to enable TRIM.
Hi rainmaker,
The license ack has nothing to do with igb driver (imho). This ist related to Intel PRO/Wireless adapters.
(https://www.freebsd.org/cgi/man.cgi?iwi)
regards pylox
Hi pylox,
Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.
My apologies for repeating it here, and thanks for the lesson.
Quote from: Rainmaker on August 03, 2018, 03:11:59 PM
Hi pylox,
Having read the relevant man pages it seems I was indeed grossly misinformed. I was told to add those lines when I first started using pfSense (and *BSD), as I was using various Intel Pro and I-series ethernet NICs. I don't run wifi on my gateway (I use Unifi APs) so I was obviously given duff information.
My apologies for repeating it here, and thanks for the lesson.
Hi rainmaker,
no problem... Some time before i did the same entry in loader.conf.local... ;D After some research i realize it's bullshit...
I think the OP's problem is something special
PPPoE related topic. Normally there should no problems with performance on APU2.
regards pylox
Hello pylox, all
just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
All I could see, that performance of single-flow iperf is constantly maxing at around 450 Mbit/sec (the direction is FROM wan TO lan). FROM lan TO wan seems slightly higher, about approx. 600-650ish Mbit/sec.
Multi-flow iperf: now here comes interesting things. The result varies from run to run, e.g. I run a 2-flow iperf session, that takes 60 seconds. It finishes, I immediately re-start with the same command, and I get a totally different result. Then after 60 seconds, repeat, yet another completely different result, in terms of throughput.
With 2-flow iperf, sometimes I can reach 850-900 Mbit, other times only as low as 250 Mbit. Yep, quite gigantic difference, even though all relevant test parameters unchanged.
When I get 850-900 Mbit throughput, the 2 flows are evenly distributed (450Mbit+450Mbit flow = Total 900 Mbit), and CPU interrupt usage is around 270%-280% (explanation: total CPU processing power is 400% = 100% per CPU core times 4 cores).
When I get 600 Mbit, usually I see 1 flow with 580Mbit, and another flow with 1-2 or max 10Mbit. Interrupt is approx. 170-180%. When I get 200-300Mbit, I get sometimes 2x 150 Mbit, other times 1x190+1x2-3Mbit flows, and only 100% interrupt usage (on 1 single core). And these vary from run to run.
Quote from: ricsip on August 06, 2018, 02:18:30 PM
Hello pylox, all
just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
......
Hi ricsip,
this ist very hard to find. Unfornatunatly i did not have a test setup with a APU2 (and not much time).
But you can try different things:
1. Change this tunables and measure...
vm.pmap.pti="0" #(disable meltdown patch - this is an AMD processor)
hw.ibrs_disable="1" #(disable spectre patch temporarily)
2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0 #(x = number of interface)
3. Change the network interface interrupt rate and measure
hw.igb.max_interrupt_rate="16000" #(start with 16000, can increased up to 64000)
4. Disable Energy Efficiency for each interface an measure
dev.igb.<x>.eee_disabled="1" #(x = number of interface)
Should be enough for the first time...;-)
regards pylox
Will try to see if any of these make a difference. But in general I am very skeptic that it wont, and as nobody from the forum owners replied anything meaningful since this thread started :(
(apart from basically saying its not practical to compare BSD and Linux)
Maybe the forum owners dont use APU?
Have you followed the interrupt stuff from:
https://wiki.freebsd.org/NetworkPerformanceTuning
How many queues does you NIC have? Perhaps you can lower the number of queues on the NIC if single stream is so important for you, but then I'd guess all other traffic will be starving ..
Quote from: ricsip on August 09, 2018, 02:32:27 PM
Will try to see if any of these make a difference. But in general I am very skeptic that it wont, and as nobody from the forum owners replied anything meaningful since this thread started :(
(apart from basically saying its not practical to compare BSD and Linux)
Hi ricsip,
be aware about there a lot of circumstances (especially with hardware, or your test-setup) things will not work in an optimal way... There is no "silver bullet" - so complaining will not help. Also possible other users of OPNSense & APU2 do not have a requirement of one near full 1Gbit flow. From my perspective you have three choices: try to use a stronger hardware, use another software or do some more testing and let participate the community...
regards pylox
https://calomel.org/freebsd_network_tuning.html
# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT))
# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)
# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1" # (default 100 packets to process concurrently)
If these suggestions improve performance, I'd love to hear about it.
Quote from: mimugmail on August 09, 2018, 04:16:00 PM
https://calomel.org/freebsd_network_tuning.html
# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT))
# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)
# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1" # (default 100 packets to process concurrently)
Testing is in progress, but at the moment I am overloaded with my other tasks. Just wanted to let you know I didnt abandon the thread. As my goal is to get this fixed, I will post the results in the next couple of days here anyway.
Quote from: pylox on August 07, 2018, 07:55:27 PM
Quote from: ricsip on August 06, 2018, 02:18:30 PM
Hello pylox, all
just to be clear: I am testing through plain IP+NAT connection (PPPoE was mentioned as a possible bottleneck, but not tested YET), and that simple test setup has approx. only 40-50% of the max. possible throughput. If I add PPPoE, it will be even slower. That's the point of this thread, trying to find at least 1 credible person who is currently using APU2 with Opnsense, and he/she confirms their speed can reach 85-90% of gigabit (at least). Even if using over PPPoE!
Then the next round will be to see, what needs to be fine-tuned to have the same perf at my ISP.
......
Hi ricsip,
this ist very hard to find. Unfornatunatly i did not have a test setup with a APU2 (and not much time).
But you can try different things:
1. Change this tunables and measure...
vm.pmap.pti="0" #(disable meltdown patch - this is an AMD processor)
hw.ibrs_disable="1" #(disable spectre patch temporarily)
2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0 #(x = number of interface)
3. Change the network interface interrupt rate and measure
hw.igb.max_interrupt_rate="16000" #(start with 16000, can increased up to 64000)
4. Disable Energy Efficiency for each interface an measure
dev.igb.<x>.eee_disabled="1" #(x = number of interface)
Should be enough for the first time...;-)
regards pylox
Ok, I did all the steps above. No improvement, still wildly sporadic measurements/results after each test-execution.
Only difference, that the CPU load characteristics went from 99% SYS + 60-70% IRQ --> 100+ 60-70% IRQ (SYS dropped to 1-2%).
Note1: only tried hw.igb.max_interrupt_rate= "8000" --> "16000" not any higher.
Note2: 2. Try to disable igb flow control for each interface and measure
hw.igb.<x>.fc=0 #(x = number of interface) --> TYPO, its actually dev.igb.<x>.fc=0
Quote from: mimugmail on August 09, 2018, 04:16:00 PM
https://calomel.org/freebsd_network_tuning.html
# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT))
# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)
# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1" # (default 100 packets to process concurrently)
I have also went through this. No measurable improvement in throughput.
machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.
hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.
hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1" --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did
I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.
Sorry .. we are all no magicians. ::)
You can go for commercial vendors like Cisco where you are limited to 85mbit and have to purchase a extra license.
Well that's disappointing. OPNsense is a great piece of software. Maybe I'll check back in when FreeBSD 12 is released as I think this is overall a better solution for my needs than ipfire.
When you send me such a device I can do some testing. No other Idea how to help
I'm willing to chip in to buy the OPNsense project an APU2.
Can also be a used one .. I dont need it for long.
Looks like I'm the only one willing to chip in?
Quote from: KantFreeze on August 21, 2018, 04:37:21 PM
Looks like I'm the only one willing to chip in?
@KantFreeze:
Lets be reasonable. Nobody will send equipment as a compliment to unknown people on the internet. At least that is my view.
@mimugmail: how about a donation towards you, so you can buy a brand new APU2 for yourself, and you could spend some valuable time to see its max. performance capabilities, and document your findings? No need to return the device at the end, you should keep it for future Opnsense release benchmarks / regression tests.
I bought my APU2 from a local reseller (motherboard + black case + external PSU + a 16Gb mSATA SSD), sum was approx. 200 EUR. If there are 10 real volunteers, I am willing to spend 20 EUR (non-refundable) "donation" on this project.
DM me for the details, if you are interested.
local = German? I can ask my boss is the company is willing to test such a device ..
Quote from: mimugmail on September 04, 2018, 01:51:02 PM
local = German? I can ask my boss is the company is willing to test such a device ..
I am not from Germany, I live in eastern Europe, just converted my local currency to EUR for an approximate estimation. But your local PC shop may sell these devices even cheaper:
http://pcengines.ch/order.htm
Quote from: mimugmail on September 04, 2018, 01:51:02 PM
local = German? I can ask my boss is the company is willing to test such a device ..
@mimugmail: I have a spare APU2 I no longer use, if you send me your bank account details, pass-codes etc.. that will do as security.
PM me we'll work something out. : :)
You want to send it to me AND want my bank details?? :P
I ordered this via company, no tax, so only 160EUR
https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M
Cool... OK.
Quote from: mimugmail on September 04, 2018, 03:11:54 PM
I ordered this via company, no tax, so only 160EUR
https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M
I really meant to support this evaluation effort. So if there is still something needed, let us know!
Quote from: ricsip on August 15, 2018, 11:55:19 AM
Quote from: mimugmail on August 09, 2018, 04:16:00 PM
https://calomel.org/freebsd_network_tuning.html
# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT))
# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)
# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1" # (default 100 packets to process concurrently)
I have also went through this. No measurable improvement in throughput.
machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.
hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.
hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1" --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did
I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.
Some small addendum:
recently I noticed (maybe when upgraded to 18.7.1_3, but TBH not sure), that sometimes (depends on the actual throughput / interrupt load shared among cores) the serial-console hangs during iperf. As soon as the iperf session is finished or I interrupt the session manually, serial-console becomes live again. Noticed during running "top" on console: I noticed refresh stopped / frozen during the iperf session, keyboard wasnt working either while the iperf traffic happened. As soon the iperf session finished, "top" continued to produce output / console responds to keystrokes.
Seems has to do something with the fact when throughput is alternating between those 2-3 discrete levels randomly among iperf sessions.
Do you run iperf on the Firewall itself?
No, never!
The 2 iperf endpoints are running on a PC connected to LAN (igb1) and another PC connected to WAN (igb0), the APU is always just a transit device (packet forwarding / packet filtering / NAT translation between igb1 and igb0 and vice versa), never terminating any iperf traffic directly on it.
Next week I should get my device and will put it in my lab. Lets see ..
Yet another small addendum:
finally I managed to test throughput over pppoe, under real life conditions.
Results are quite weak:
approx. 250-270 Mbit/sec (WAN-->LAN traffic direction) was achieved with the APU2. Not iperf this time, but tested with some torrent (so nobody can tell that I was pushing for unrealistic expectations over 1 single flow).
Again, the router was only a transit device, the torrent client was running on a PC behind the APU. SSD wasnt the bottleneck during download.
As a comparison, using a different vendor router, I was able to achieve 580-600 Mbit/sec easily downloading the same test torrent. Didnt investigate if it could go higher or not with this different vendor router, but thats still more than double performance difference.
You mean IPFire on the same hardware?
Quote from: mimugmail on September 07, 2018, 05:59:15 PM
You mean IPFire on the same hardware?
No, not ipfire. Sorry if I was unclear :)
I installed a competely different equipment ( Asus AC66U B1 router) just for comparison to see if that router can reach the wirespeed gigabit.
On the APU I could not test ipfire today due to not enough time, but maybe in the coming days I will do another round of tests using the ipfire.
Need to find a timeslot when no users are using the internet :(
If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.
Quote from: mimugmail on September 07, 2018, 06:20:39 PM
If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.
Yes, you are right! Some weeks ago, I did run the ipfire distrib on the APU. But that was only in an isolated LAN, without access to pppoe or to the internet. So I could run my iperf benchmarks without breaking the production internet
Today, unfortunately I wasted a lot of time to make the opensense work on my production pppoe internet connection. Basically the Default gateway was not activated properly after the pppoe session came up, so any internet traffic failed with TTL expired error.
My existing config in opnsense I was using static ip for the WAN (remember, I used an isolated LAN earlier for iperf testing). Today I changed the WAN config from static IP to pppoe. But some previous static Lan def gw config was stuck, and wasnt deleted properly (actually dmesg log complained about 2 gateways failed to remove). I logged into console, and tried couple of times to reset the interface assignment and re-do the ip addressing, then logged into GUI and switched from WAN static IP to pppoe. The CLI console does not allow me to perform advanced config, like pppoe setup, so I had to perform that from GUI.
But it was still broken. I got ppoe session up, and I recieved public IP from my ISP, but the default gw was still the IP of my oldconfig LAN IP.
That is when I decided to login to consoleagain, select Option 4) factory reset, and re-did the Initial setup wizard from scratch on the GUI. I selected WAN type:pppoe, and this way I succeeded. But it wasted half of my day.
https://github.com/opnsense/core/issues/2186
I found this bugreport about pppoe default gateway not updating after the pppoe session activates, but it looked like that bug was fixed in 18.1.9 or so. Seems i was hitting something similar, dont really know.
So basically I did not have time to switch the operating system, boot ipfire, and repeat the same tests under linux OS. Planning to do it in the next coming days.
Well, I did test the IPFIRE as well on APU2 (I used latest ipfire-2.21-core123).
I could only achieve the same 250-290 Mbit/sec for the same torrent, as yesterday with the opnsense. Because I was suspicious, I also tried to connect my laptop directly to my ISP (I set up the pppoe profile directly on my PC), and tried it without any middle-router: speed was the same 250-280 Mbit/sec this time. So I think there is a problem with my bloody ISP today (yesterday I managed to get 600 Mbit so there must be something going on today). There is no point continuing this testing until I can figure out what the hell is happening.
If anyone can share with me the simplest PPPOE simulator config, based on Freebsd or Linux, I am going to try that on a powerful PC connecting to my APU, and completely rule out the uncertain ISP from this equation for these tests (I would run the IPERF on the PPPOE-simulator PC itself, being the WAN-endpoint for IPERF). Torrent would be difficult to simulate in such topology, so have to revert to iperf first.
Me again.
I did some further testing. No PPPoE involved (dont have access to the internet line at the moment, only performed pure IP<-->IP in my lab, opnsense is only in the transit, not running iperf itself).
Found the option in the menu, where I can literally turn off the firewall (disable all packet filtering), which also disables NAT, and turns opnsense into a plain routing box.
Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)
Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).
Providing iperf -P 4 gives also very variable results:
- sometimes 1,2 or even 3 sessions are 0Kbit/sec, while the 4th session achieves the maximum throughput that was measure with single flow (-P 1)
- other times 1 flow has double throughput than the other 3 (unbalanced)
Quote from: mimugmail on September 07, 2018, 02:40:33 PM
Next week I should get my device and will put it in my lab. Lets see ..
Hello mimugmail,
did you have a chance to look at the perf of the his box?
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/
Quote from: mimugmail on September 27, 2018, 12:52:23 PM
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/
No problem, take your time and have fun! Hope you can find some clever solution, I am mostly stuck since some time.
Note: be careful what BIOS version you flash! Check these links to be in picture:
https://pcengines.github.io
https://github.com/pcengines/coreboot/issues/196
http://www.pcengines.info/forums/?page=post&id=4C472C95-E846-42BF-BC41-43D1C54DFBEA&fid=6D8DBBA4-9D40-4C87-B471-80CB5D9BD945
http://pcengines.ch/howto.htm#bios
Yes, its kinda mess how unorganized the docs are for this company.
Quote from: mimugmail on September 27, 2018, 12:52:23 PM
It's here on my table and installed, but I didnt find the yet, sorry.
Hopefully next week :/
Hello, did you manage to check it?
My apprentice set it up last week, did some BIOS Updates, will start tomorrow :)
Quote from: mimugmail on October 07, 2018, 02:58:53 PM
My apprentice set it up last week, did some BIOS Updates, will start tomorrow :)
Thanks, I'm really curious to see your results!
Quote from: ricsip on September 13, 2018, 01:15:38 PM
Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)
Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).
I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.
Ok, tried all available tuning stuff, single stream download in NAT environment is only 440mbit. I'll a vanilla FreeBSD on thursday ...
I'm not able to install FBSD 11.1 since it always hangs on boot at some acpi stuff. Also happened on OPNsense 18.7 and after around 20 restarts, new install and reverting config it worked again.
11.2 also not possible to install .. don't have the time now.
I have not idea if my device is bricked or sth. but it's way far away from stable .. and only serial is a mess ::)
Quick question: can you tell me
1) what BIOS is running on the board (should be the first thing visible on the serial output if powered on)
2) What storage have you added to the board? Are you trying to boot from SD card or from internal mSATA or something else?
Ps. I managed to run Freebsd 11.2 from a USB drive in Live mode, did not install it to the internal mSATA hard drive.
I'm on 4.0.19. Live CD is a good idea .. I can try this next week.
Quote from: mimugmail on October 12, 2018, 10:43:22 AM
I'm on 4.0.19. Live CD is a good idea .. I can try this next week.
Ok.
By the way, better to use firmware 4.0.18, because 19 has some new boot issue, that has been found recently, and its a big mistery when will pcengines fix it in 4.0.20.
Update: actually they released it already:
https://pcengines.github.io/#lr-12
There seems to be a related fix: "pfSense 2.4.x fails to boot when no USB stick is plugged"
Quote from: mimugmail on October 09, 2018, 01:35:15 PM
Quote from: ricsip on September 13, 2018, 01:15:38 PM
Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)
Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).
I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.
Similar results with vanilla 11.1, now upgrading to 11.2
ECC is fixed on the APU-platform effective 2018-10-04 BIOS v4.8.0.5 Mainline release.
https://pcengines.github.io
https://3mdeb.com/firmware/enabling-ecc-on-pc-engines-platforms/#.W8eUoKeHKuM
An unintentional dubble post.
Quote from: mimugmail on October 17, 2018, 10:47:39 AM
Quote from: mimugmail on October 09, 2018, 01:35:15 PM
Quote from: ricsip on September 13, 2018, 01:15:38 PM
Results (iperf -P 1 == single flow):
1)->firewall disabled, NAT disabled: can easily transmit 890-930 Mbit from WAN-->LAN, and vice versa, CPU load is approx 1x core 65% INT , another 1x core 10-30% in INT, the rest is idle. Throughput is stable, very minimal variation.
2)->firewall enabled, NAT disabled: this time its peak at 740-760 Mbit from WAN-->LAN, and vice versa, CPU load 1x 100% INT + 1x 20% INT, rest is idle. Occasionally, I get these strange drops to around 560 Mbit or to around 630 Mbit.
3)->firewall enabled, NAT enabled: LAN -->WAN: approx 650-720 Mbit, WAN-->LAN: around 460 Mbit constantly (100%+20% INT)
Results for 2) and 3) are not really consistent, and greatly vary between iperf sessions. So does the CPU load characteristics (sometimes less INT load results a higher throughput, other times double the INT load results much lower throughput).
I got exactly same results. After this I tried enabling hw offloading on the NIC but the system doesnt boot anymore .. also after reinstall. Have to dig trough later this week.
Similar results with vanilla 11.1, now upgrading to 11.2
Same with 11.2. I'll now install OPNsense on a similar hardware to see if it's related to the hardware ..
Thanks for the constant status updates :) Eagerly waiting for your results.
By the way: pls dont forget that there is a current known issue in coreboot 4.8.x regarding CPU downlclocking:
https://github.com/pcengines/coreboot/issues/196
so make sure the poor performance is not because the APU lowers the clockrate after couple of minutes uptime to @600 Mhz , instead of 1Ghz :)
But I'm running 4.0.18?
I tested some old Sophos UTM with Atom N540 processor and got in all directions with 1 or 10 streams only 500-600Mbit. I'm searching for a device quite comparable to the APU :)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232451
i210 has software configurable flow control. Maybe the configuration is not that good?
Registers are:
The following registers are defined for the implementation of flow control:
• CTRL.RFCE field is used to enable reception of legacy flow control packets and reaction to them
• CTRL.TFCE field is used to enable transmission of legacy flow control packets
• Flow Control Address Low, High (FCAL/H) - 6-byte flow control multicast address
• Flow Control Type (FCT) 16-bit field to indicate flow control type
• Flow Control bits in Device Control (CTRL) register - Enables flow control modes
• Discard PAUSE Frames (DPF) and Pass MAC Control Frames (PMCF) in RCTL - controls the forwarding of control packets to the host
• Flow Control Receive Threshold High (FCRTH0) - A 13-bit high watermark indicating receive buffer fullness. A single watermark is used in link FC mode.
• DMA Coalescing Receive Threshold High (FCRTC) - A 13-bit high watermark indicating receive buffer fullness when in DMA coalescing and Tx buffer is empty. The value in this register can be higher than value placed in the FCRTH0 register since the watermark needs to be set to allow for only receiving a maximum sized Rx packet before XOFF flow control takes effect and reception is stopped (refer to Table 3-28 for information on flow control threshold calculation).
• Flow Control Receive Threshold Low (FCRTL0) - A 13-bit low watermark indicating receive buffer emptiness. A single watermark is used in link FC mode.
• Flow Control Transmit Timer Value (FCTTV) - a set of 16-bit timer values to include in transmitted PAUSE frame. A single timer is used in Link FC mode
• Flow Control Refresh Threshold Value (FCRTV) - 16-bit PAUSE refresh threshold value
• RXPBSIZE.Rxpbsize field is used to control the size of the receive packet buffer
The datasheet has very detailed descriptions on how flow control works: https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf
Quote from: mimugmail on October 19, 2018, 07:37:07 PM
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232451
Do you think its a flow-control related bug?
No idea, it sounds familiar ...
Played with FC, tried again mixing setup with TSO, LRO, XCSUM .. always same result.
Found this one:
https://elatov.github.io/2017/04/pfsense-on-netgate-apu4-1gb-testing/
Dont have any other ideas now ..
I tried a test kernel from franco which might come with 19.1 and gained a slightly better rate from 480mbit to 510mbit .. ok, last test for today :)
I think such small difference cam easily be the random variation between test runs. I could see similar variations myself running on the same OS.
Anyway, thanks for your support, at least I know its not just me. Practically all Pcengines APU2 owners should consider something different for 1Gbit WAN. If opnsense will be installed on the board of course. :-)
Quote from: ricsip on October 22, 2018, 01:30:33 PM
Anyway, thanks for your support, at least I know its not just me. Practically all Pcengines APU2 owners should consider something different for 1Gbit WAN. If opnsense will be installed on the board of course. :-)
Why? It achieves 1GB with multiple streams easily .. why would someone need 1GB on 1 stream?
Do you have any chance to access PPPOE-based WAN / PPPOE-based WAN simulator? As I also have issues to reach 1 Gbit even on multi-stream, if PPPOE is used for the WAN Aconnection. I already gave up hope for 1Gbit single-flow performance, but even multi-flow performance is quite low. Where connecting a PC to the same PPPOE WAN directly (no OPNSENSE router/firewall in front of the PC), I can achieve much higher speeds.
Hi, I have been following this thread and other related forums re: achieving 1GBit via PPPoE with PCEngines' APU2. net.isr.dispatch = "deferred" yielded only a small speed improvement - from 400Mbps to 450Mbps. Using the ISP-provided DIR-842, I can hit up to 800+Mbps. I am on the latest OPNsense with the stock kernel. PFSense on the same APU2 and net.isr.dispatch = "deferred" yielded 520-550Mbps.
I have an APU2 board with OPNsense as well. My board only achieves about 120 MBit/s per NIC in iPerf >:(
I posted the problem here: https://forum.opnsense.org/index.php?topic=11228.0
Hi,
I've just found this blog entry: https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/
So the APU2 series should be able to achieve 1 gbit with pfsense. ::)
best regards
Dirk
Quote from: monstermania on February 25, 2019, 09:52:57 AM
Hi,
I've just found this blog entry: https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/
So the APU2 series should be able to achieve 1 gbit with pfsense. ::)
best regards
Dirk
IF(!!!) the wan type is NOT pppoe! That fact is not revealed in that blog. Can cause giant speed decrease, thanks to Freebsd pppoe handling defect.
I can only achieve 160-200 mbit, and that is fluctuating heavily between test runs. A cheap Asus RT-AC66U B1 can easily reach 800+ mbit on the very same modem/subscription.
This topic didn't received much love since the last few months, but I can attest the issue is still present: OPNsense 19.7 on APU2 cannot reach 1Gbps from WAN to LAN with default setup on a single traffic flow.
So I dug around and found a few threads here and there about this, and finally found this topic to which I am replying. I saw many did some tests, saw the proposed solution at TekLager, etc, but they don't really adress the single flow issue.
I've read about the mono-thread vs multi-thread behavior of the *BSD vs Linux, but single flow traffic will only use 1 thread anyway so I had to discard that too as a probable cause.
I then decided to make my own tests and see if this was related to a single APU2 or all of them. I've tested 3 x APU2 with different firewall and this is the speed I get with https://speedtest.net (with NATing enable of course):
OPNsense down: ~500 Mbps up: ~500 Mbps
pfSense down: ~700 Mbps up: ~700 Mbps
OpenWRT down: ~910 Mbps up: ~910 Mbps
IPFire down: ~910 Mbps up: ~910 Mbps
pfSense on Netgate 3100 down: ~910 Mbps up:~910 Mbps
My gaming PC (8700k) connected directly into the ISP's modem down: ~915 Mbps up:~915 Mbps
I also did some tests by virtualizing all these firewalls (except OpenWRT) on my workstation (AMD 3950X) with VirtualBox (Type 2 Hypervisor - not the best I know didn't had the time to setup something on the ESXi cluster) and you can substract ~200Mbs from all the speeds above. So that means, even virtualized, IPfire is faster than both OPNsense and pfSense running of the APU2. I also saw that all of them are using only ONE thread and using almost the same amount of CPU% when the transfer is going on.
My conclusions so far are these:
-The PC Engine APU2 is not the issue - probably a driver issue for OPNsense/pfSense
-Single threaded use for single traffic flow is not the issue either since some firewalls are able to max the speed on 1 thread
-pfSense is still based on FreeBSD which has one the best network stack in the world but it might not use the proper drivers for the NICs on the APU - that's my feeling but can't check this.
-OPNsense is now based on HardenBSD (which is a fork of FreeBSD) and add lots of exploit mitigations directly into the code. Those security enhancements might be the issue with the APU2 slow transfer speed. OPNsense installed on premise with a ten year old Xeon X5650 (2.66Ghz) can run at 1 Gbps without breaking a sweat. So maybe a few MHz more are required for OPNsense to max that 1 Gbps pipe.
-OpenWRT and IPFire are Linux based and they benefit from a much broader 'workforce' optimizing everything around them. NICs are probably detected properly and the proper drivers are being used + the nature of how Linux works could also help in speeding everything a little bit more. And the Linux kernel is a dragster vs FreeBSD kernel (sorry FreeBSD but I still love you since I am wearing your t-shirt today!!).
My next steps would be if I have time, to do direct speed tests internally with iperf3 in order to have another speed chart I can refer too.
Edit: FreeBSD vs HardenedBSD Features Comparison https://hardenedbsd.org/content/easy-feature-comparison
Edit 2: Another thing that came to my mind is the ability of the running OS (in our case OPNsense) to be able to 'turbo' the cores up to 1.4Ghz on the AMD GX-412TC cpu that the APU2 uses. The base frequency is 1Ghz but with turbo it can reach 1.4Ghz. I am running the 4.10 latest firmware, but I can't (don't know how) to validate what frequency is being used when doing a transfer. That would really justify the difference in transfer speed as to why OPNsense can't max a 1 Gbps link while others can. Link on how to upgrade the bios in the APU2 : https://teklager.se/en/knowledge-base/apu-bios-upgrade/
Greatly appreciated your effort. I gave up this topic since a long time, but if you have the energy to go and find the resolution, you have all my support :) !
1 thing I would like to ask you: could you check your results if you emulate PPPoE on the INTERNET interface, instead of plain simple LAN IP protocoll on the WAN interface? As your results will be much much worse under opnsense then what you achieved in this test.
My APU2 is connected via a CAT6a Ethernet cable to the ISP's modem, which in turn is connected via another CAT6a Ethernet cable to the Fiber Optic transceiver. Then the connection between the ISP's modem is done via PPPoE (which I don't managed - it's done automatically and setup by the ISP).
So the APU2 isn't doing the PPPoE connectivity (as it would have been in this typical scenario 15 years ago via DSL for example) and it is a good thing. Now if your setup requires the APU2 to perform the PPPoE connectivity, that doesn't really impact the transmission speed.
"Now if your setup requires the APU2 to perform the PPPoE connectivity, that doesn't really impact the transmission speed."
There is a very high chance, that the pppoe session handling and single threaded MPD daemon is the biggest bottleneck on the apu2 to reach the 1 gigabit speed.
I've setup another test lab (under VirtualBox) to test the iperf3 speed between 2 Ubuntu server each behind an OPNsense 19.7.8 (fresh update from tonight!). All VMs are using 4 vcpu and 4 GB of RAM.
-First iperf3 test (60 seconds, 1 traffic flow):
The virtual switch performance between SVR1 and SVR2 connected together yields ~2.4Gbps of bandwidth
-Second iperf3 test (60 seconds, 1 traffic flow):
This time, SVR1 is behind FW1 and SVR2 is behind FW2. Both FW1 and FW2 are connected directly on the same virtual switch. Minimum rules are set to allow connectivity between SVR1 and SVR2 for iperf3. Both FW1 and FW2 are NATing outbound connectivity. The performance result yields ~380Mbps.
-Third iperf3 test with PPPoE (60 seconds, 1 traffic flow):
FW1 has the PPPoE Server plugin installed and configured. FW2 is the PPPoE client that will initiate the connection. The performance result yields ~380Mbps.
-Fourth iperf3 test with PPPoE (60 seconds, 2 traffic flow): ~380Mbps
-Fifth iperf3 test with PPPoE (60 seconds, 4 traffic flow): ~390Mbps
So unless I missed something, PPPoE connectivity doesn't affect network speed as I mentionned earlier.
I will try to replicate the same setup but with 2 x APU2 and post back the performance I get.
Thanks for your effort, this one was a really interesting test series.
The reason why I am suspecting the pppoe encapsulation is a serious limiting factor, that the internet is full of articles that all says the same thing: the pppoe is unsuitable for receve-queue distribution. The result is that only 1 cpu core can effectively process the entire pppoe flow, which means the other cores are sitting idle while 1 core is at 100% load. Because the APU2 has very weak single-core CPU processing power, if the above multi-queue receive is deactivated for pppoe, that is a big warning against using this product for 1gbit networks.
But anyway, I am really curious to see the next test results.
As far as I can recall, I could do 600 Mbit/s only from LAN --> WAN direction (e.g. UPLOAD from lan client to internet server), the WAN--> LAN direction (e.g. download from internet server to lan client) was much slower. And all these results were using pure IP between 2 directly connected test PC. When I installed the firewall into my production system, I reconfigured the WAN interface to pppoe, and real world results were lower than the testbench results.
Be cautious about what you read on single threaded process being a limiting factor.
When a single traffic flow enters a device, the ASIC performs the heaving lifting most of the time. The power required afterward to analyze, route, NAT, etc, that traffic is done most of the time by a cpu core (or 1 thread) somewhere up the process stack.
But that process cannot be well distributed (or parallelized) on many threads (cores) for a single traffic flow - it would be inefficient in the end since the destination is the same for all the threads and they would have to 'wait' after each other and thus slowing other traffic flow that requires processing.
When multiple traffic flows are entering the same device, of course the other cpu cores will be used to handle the load appropriately.
The only ways to optimize or accelerate single traffic flow on a cpu core are:
-good and optimized network code
-the appropriate network drivers that 'talk' to the NIC
-speedier cpu core (aka higher frequency (GHz/MHz)
A comparison of this behavior is the same kind of (wrong) thinking that people think about link aggregation: if we bundled 4 x 1 Gbps links together, people will think that their new speed for single flow traffic is now 4 Gbps and they are surprised to see that their max speed is still only 1 Gbps because of 'slow' 1-link wire is 1 Gbps. On multiple traffic flows, then the compound traffic will reach the 4 Gbps speed because now each one of the 1 Gbps links are being used.
I hope that clears up some confusion.
But in the end, there is definitely something not running properly on both OPNsense and pfSense on those APU boards.
The APU's hardware is ok - many and I have showed that.
So what remains are:
a) bad drivers for the Intel 210/211 NICs
b) bad code optimization (the code itself or the inability to make the cpu core reach its 1.4Ghz turbo speed),
c) both a & b
The Netgate SG-3100 that I have has an ARM Cortex-A9 which is a dual-core cpu running at 1.6Ghz and its able to keep that 1 Gbps speed. And we saw above that pfSense if somewhat faster on the APU compared to OPNsense. IMO, I really think we are facing a NIC driver issue from FreeBSD for the Intel 210/211 chipset.
Haven't had the time to setup the APU, but I re-did the same test under ESXi because I was curious of the performance I could reach.
The ESXi 6.7 host is a Ryzen 2700X processor with 32GB and it's storage hooked on a networked FreeNAS. All four vms were running on it with 2 vcpu and 4 GB RAM each.
The virtual switch bandwidth from svr1 to svr2 direct iperf3 bandwidth was ~24 Gbps.
Then the same flow but svr1 having to pass through fw1 (NAT+Rules), then fw2 (NAT+Rules) then reaching svr2 gave an iperf3 bandwidth of ~4Gbps.
That's a far cry from what I've achieved on faster hardware under VirtualBox lol.
On another subject: I had an issue with this setup under ESXi as the Automatic NAT rules weren't generated for some reason on both firewalls (they were under VirtualBox though). I find that odd, but I recall a few weeks ago while I was giving a class at the college and was using OPNsense for setting up an OpenVPN vpn with my students I was seeing internal network address reaching my firewall WAN port. The day before, I wasn't seeing this and I didn't change the setup, so I blamed VirtualBox for the problem... but now, I see the same behavior under ESXi and I am wondering if there is an issue with the automatic Outbound NAT rules generation somehow. What is causing this behavior?
Nat always reduces throughput as it travels the CPU. Auto NAT can cause problems when you have multiple devices in this network to reach. Then you have to remove upstream gateway in Interface config and add manual nat rules. :)
I wouldn't say that NAT always reduces throughput as it depends on what devices are used.
APUs and lot of other cheap and low powered devices do have issues with NAT yes - it was the main reason why I ditched many consumer grade routers when I got fiber 1 Gbps at home 4 years ago. Back then, only the Linksys 3200ACM was able to keep up the speed with NAT active... until mysteriously - like hundreds of other people that posted on the Linksys forums - connections started to drop randomly and Internet connectivity became a nightmare.
That's when I started looking for something better and I ended up with pfSense on a SG-3100 two years ago. All my problem were solved and still are up to this day.
Can we please quit the others-are-so-great talk now? I don't think mentioning it in every other of your posts really helps this community in any substantial way.
Cheers,
Franco
You're totally right and fixed it.
Not what I expected when I saw your response and compared it to the edit knowing what you wrote before, but, hey, fair enough that the theme is still the same. All hail the better sense. I guess it's ok to use this opportunity to show a community its shortcomings in particular areas while not being able to throw a bit of money towards capable hardware at least. ;)
I plan on getting an APU2D4 soon since it has superseded the C4. I was wondering if anyone could check to see what sysctl dev.cpu.0.freq_levels
outputs and then post what sysctl dev.cpu.0.freq
outputs under load?
I know this is an AMD based CPU, but from what I understand the CPU will not turbo unless you are running the powerd deamon. For a console based output you can run powerd with sudo powerd -v
assuming sudo is installed on OPNsense.
I hope this helps.
I ran the commands you wrote on the console and here is what I've got from an APU4B4:
root@OPNsense02:~ # sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1000/1008 800/831 600/628
Idle:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000
Under load:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000
So the frequency didn't really change. Now with powerd running, here is the output where you will see the max frequency still being 1000Mhz:
root@OPNsense02:~ # sudo powerd -v
powerd: unable to determine AC line status
load 4%, current freq 1000 MHz ( 0), wanted freq 968 MHz
load 7%, current freq 1000 MHz ( 0), wanted freq 937 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 907 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 878 MHz
load 7%, current freq 1000 MHz ( 0), wanted freq 850 MHz
load 6%, current freq 1000 MHz ( 0), wanted freq 823 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 797 MHz
changing clock speed from 1000 MHz to 800 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 772 MHz
load 4%, current freq 800 MHz ( 1), wanted freq 747 MHz
load 6%, current freq 800 MHz ( 1), wanted freq 723 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 700 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 678 MHz
load 3%, current freq 800 MHz ( 1), wanted freq 656 MHz
load 5%, current freq 800 MHz ( 1), wanted freq 635 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 615 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 600 MHz
changing clock speed from 800 MHz to 600 MHz
load 10%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 5%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 8%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 5%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 11%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 143%, current freq 600 MHz ( 2), wanted freq 2000 MHz
changing clock speed from 600 MHz to 1000 MHz
load 130%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 85%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 107%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 101%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 106%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1937 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1876 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 1817 MHz
load 6%, current freq 1000 MHz ( 0), wanted freq 1760 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1705 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1651 MHz
load 5%, current freq 1000 MHz ( 0), wanted freq 1599 MHz
load 8%, current freq 1000 MHz ( 0), wanted freq 1549 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1500 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1453 MHz
load 5%, current freq 1000 MHz ( 0), wanted freq 1407 MHz
load 8%, current freq 1000 MHz ( 0), wanted freq 1363 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 1320 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1278 MHz
load 3%, current freq 1000 MHz ( 0), wanted freq 1238 MHz
load 9%, current freq 1000 MHz ( 0), wanted freq 1199 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1161 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1124 MHz
load 3%, current freq 1000 MHz ( 0), wanted freq 1088 MHz
load 8%, current freq 1000 MHz ( 0), wanted freq 1054 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1021 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 989 MHz
load 15%, current freq 1000 MHz ( 0), wanted freq 958 MHz
load 7%, current freq 1000 MHz ( 0), wanted freq 928 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 899 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 870 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 842 MHz
load 6%, current freq 1000 MHz ( 0), wanted freq 815 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 789 MHz
changing clock speed from 1000 MHz to 800 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 764 MHz
load 6%, current freq 800 MHz ( 1), wanted freq 740 MHz
load 6%, current freq 800 MHz ( 1), wanted freq 716 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 693 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 671 MHz
load 6%, current freq 800 MHz ( 1), wanted freq 650 MHz
load 4%, current freq 800 MHz ( 1), wanted freq 629 MHz
load 5%, current freq 800 MHz ( 1), wanted freq 609 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 600 MHz
changing clock speed from 800 MHz to 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 8%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 9%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 5%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 9%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 10%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 10%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 7%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 75%, current freq 600 MHz ( 2), wanted freq 1200 MHz
changing clock speed from 600 MHz to 1000 MHz
load 293%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 364%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 382%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 373%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 248%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 250%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 269%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 370%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 345%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 282%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 250%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 276%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 251%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 258%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 267%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 273%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 238%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 270%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 267%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 273%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 264%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 276%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 249%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 241%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 266%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 250%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 247%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 257%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 288%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 263%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 241%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 273%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 257%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 264%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 256%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 263%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 256%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 257%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 248%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 263%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 264%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 261%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 272%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 241%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 254%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 247%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 260%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 258%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 244%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 251%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 85%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 1937 MHz
load 138%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 322%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 322%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 307%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 330%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 331%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 313%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 313%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 325%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 325%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 319%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 322%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 316%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 335%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 332%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 342%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 317%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 338%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 326%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 330%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 313%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 337%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 391%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 394%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 394%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 394%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 397%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 400%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 319%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 101%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 103%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 112%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 108%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 105%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 110%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 172%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 208%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 201%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 210%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 185%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 204%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 185%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 203%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 190%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 136%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 104%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 100%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 103%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 96%, current freq 1000 MHz ( 0), wanted freq 2000 MHz
load 13%, current freq 1000 MHz ( 0), wanted freq 1937 MHz
load 8%, current freq 1000 MHz ( 0), wanted freq 1876 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1817 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 1760 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1705 MHz
load 8%, current freq 1000 MHz ( 0), wanted freq 1651 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1599 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 1549 MHz
load 3%, current freq 1000 MHz ( 0), wanted freq 1500 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1453 MHz
load 3%, current freq 1000 MHz ( 0), wanted freq 1407 MHz
load 7%, current freq 1000 MHz ( 0), wanted freq 1363 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 1320 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1278 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1238 MHz
load 7%, current freq 1000 MHz ( 0), wanted freq 1199 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1161 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 1124 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1088 MHz
load 5%, current freq 1000 MHz ( 0), wanted freq 1054 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 1021 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 989 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 958 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 928 MHz
load 3%, current freq 1000 MHz ( 0), wanted freq 899 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 870 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 842 MHz
load 4%, current freq 1000 MHz ( 0), wanted freq 815 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 789 MHz
changing clock speed from 1000 MHz to 800 MHz
load 6%, current freq 800 MHz ( 1), wanted freq 764 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 740 MHz
load 5%, current freq 800 MHz ( 1), wanted freq 716 MHz
load 3%, current freq 800 MHz ( 1), wanted freq 693 MHz
load 4%, current freq 800 MHz ( 1), wanted freq 671 MHz
load 0%, current freq 800 MHz ( 1), wanted freq 650 MHz
load 4%, current freq 800 MHz ( 1), wanted freq 629 MHz
load 3%, current freq 800 MHz ( 1), wanted freq 609 MHz
load 9%, current freq 800 MHz ( 1), wanted freq 600 MHz
changing clock speed from 800 MHz to 600 MHz
load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 5%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz
load 25%, current freq 600 MHz ( 2), wanted freq 600 MHz
^Ctotal joules used: 73.271
I will post back with the APU4D4 and see the difference.
Same results for the APU4D4.
The BIOS of both PCEngines' board are the latest, but the cpu frequency seems capped at 1Ghz which would explain why we can only get around ~650Mbps at best on gigabit links. That AMD GX-412TC can do 1.2Ghz on boost.
Can you please post dmidecode -t BIOS on each board for future reference ?
APU4B4 info (bought in June 2018):
root@OPNsense02:~ # dmidecode -t BIOS
# dmidecode 3.2
Scanning /dev/mem for entry point.
SMBIOS 2.8 present.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: coreboot
Version: v4.11.0.1
Release Date: 12/09/2019
ROM Size: 8192 kB
Characteristics:
PCI is supported
PC Card (PCMCIA) is supported
BIOS is upgradeable
Selectable boot is supported
ACPI is supported
Targeted content distribution is supported
BIOS Revision: 4.11
Firmware Revision: 0.0
APU4D4 info (bought in November 2019):
root@OPNsense:~ # dmidecode -t BIOS
# dmidecode 3.2
Scanning /dev/mem for entry point.
SMBIOS 2.8 present.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: coreboot
Version: v4.11.0.1
Release Date: 12/09/2019
ROM Size: 8192 kB
Characteristics:
PCI is supported
PC Card (PCMCIA) is supported
BIOS is upgradeable
Selectable boot is supported
ACPI is supported
Targeted content distribution is supported
BIOS Revision: 4.11
Firmware Revision: 0.0
I have an APU2D4 for my home firewall and I'm trying to get max performance since the APU2 has limited "horsepower".
I've tried to configure my system to get max benefit from the i210 NICs (thank you calomel.org, among others)
I recently found that for OPNsense, the default value for kern.random.harvest is 2047
root@OPNsense:~ # sysctl kern.random.harvest
kern.random.harvest.mask_symbolic: UMA,FS_ATIME,SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 000000000011111111111
kern.random.harvest.mask: 2047
Based on some recommendations for FreeBSD, set kern.random.harvest.mask = 351 for max throughput.
root@OPNsense:~ # sysctl kern.random.harvest
kern.random.harvest.mask_symbolic: [UMA],[FS_ATIME],SWI,[INTERRUPT],NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 000000000000101011111
kern.random.harvest.mask: 351
The UMA (universal memory allocator) also called zone allocator
According to FreeBSD documentation for RANDOM(4): "obtain entropy from the zone allocator. This is potentially very high rate, and if so will be of questionable use. If this is the case, use of this option is not recommended."
Default values
FreeBSD: kern.random.harvest.mask 511
pfSense: Is it the same as the default for FreeBSD ?
OpnSense: kern.random.harvest.mask 2047
For max throughput: kern.random.harvest.mask 351
For an APU2, kern.random.harvest.mask 511 --> 351 gives about 3% better throughput for FreeBSD.
Has anyone documented the throughput difference from kern.random.harvest.mask 2047 --> 511 ?
I know that my firewall will have a little less entropy, but for my purposes, that's OK.
To set kern.random.harvest.mask, I have to use the GUI: System -> Settings -> Tunables and add kern.random.harvest.mask
I have 200 Mbps Internet, I get the same tested speed (236 Mbps) as my consumer-grade router that I know is capable of Gigabit speed.
If you followed the thread, you know that the APU2 can easily do 500+ Mbps with no tweaking. The issue we have with it is being able to handle 1 Gbps links when OPNsense is on it vs IPFire or OpenWRT who are able on the same hardware to achieve 900+ Mbps.
When I have time I will try you settings just to see what happens.
Quote from: pjdouillard on January 21, 2020, 03:05:17 AM
I ran the commands you wrote on the console and here is what I've got from an APU4B4:
root@OPNsense02:~ # sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1000/1008 800/831 600/628
Idle:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000
Under load:
root@OPNsense02:~ # sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1000
So the frequency didn't really change. Now with powerd running, here is the output where you will see the max frequency still being 1000Mhz:
root@OPNsense02:~ # sudo powerd -v
powerd: unable to determine AC line status
load 4%, current freq 1000 MHz ( 0), wanted freq 968 MHz
load 7%, current freq 1000 MHz ( 0), wanted freq 937 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 907 MHz
load 0%, current freq 1000 MHz ( 0), wanted freq 878 MHz
.
.
.
I will post back with the APU4D4 and see the difference.
Thank you for posting that info. So from what I understand OPNsense is not getting the turbo clock speed info from the BIOS. I'm not exactly sure why that is the case though.
The limitation is likely on the hardware side, probably miczyg would be one of the best people to address the why of it if he sees this tread.
The same hardware with Linux based OS (IPFire and OpenWRT) are able to max that 1 Gbps NICs without problems (see post on previous page).
Based on this document from pcengines the frequency is not reported correctly in sysctl. So you have to set these commands in loader.conf to get proper readings.
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1
hint.acpi_perf.0.disabled=1
I don't quite understand if this affects the actual speed of the device though. If someone could confirm that would be amazing.
https://github.com/pcengines/apu2-documentation/blob/master/docs/apu_CPU_boost.md
Unfortunately this is only a displaying matter.
I just installed on my APU4D4 pfsense 2.5.0 latest beta version and there I get 660Mbps instead of 340Mbps with the latest opnsense 20.1. So the underlying BSD version seems to handle the drivers in a better way, but still to few as I have an 1Gibt/s ISP link.
I don't want to be unfriendly, but I'm definitely going to close this thread if people keep comparing apples and oranges.
Cheers,
Franco
I experiment with my apu2 and opnsense-firewall and my impression is, that the most important configuration is
Quotenet.isr.dispatch=deferred
I am able to saturate a 250mbit downlink from the german Telekom with one stream.
Of couse ids/netopng is not possible, if i want to saturate the connection.
Quote from: franco on February 13, 2020, 11:57:30 AM
I don't want to be unfriendly, but I'm definitely going to close this thread if people keep comparing apples and oranges.
Cheers,
Franco
Hello Franco,
I disagree as this isn't apples to oranges comparison, but as this thread is going on (started in July 2018 and still no resolution), comparing other firewalls with OPNsense running on the
SAME hardware and saying what we are trying to solve the issue is the only thing we can do "on our side". And up to now, not a single dev produced some help in this thread as to why we might be having the issue or some path of resolution/explanation.
The PCEngine hardware is used by a lots of people around the world (privately and commercialy) since many years (before OPNsense was forked) and it provides a lot and fills a segment on the market that other commercial brands can't even achieve for the same price (reliability and low power usage). So we want to maximize our investment AND also use OPNsense because we like/prefer it over other firewalls. Trying to muzzle or threathened us by closing the thread isn't the right direction imo and isn't what I am expecting from the OPNsense forum - and is a reason many of us left "that other well known firewall" for OPNsense. We are not bitching but we are kind of fed up (in a way) by the lack of help or feedback by the guys who are making OPNsense.
So to be back on the thread itself, since other firewalls (Linux-based firewalls) are able to max the gigabit speed on any of the NIC of the APU2 from PCengine, we are all puzzled as to why OPNsense isn't capable of doing it. FreeBSD has the best TCP/IP stack of the *NIX out there
so what is the problem?
We are not all Operating System developpers and thus are not equipped to check what's going on when a transfer is occuring on the APU2's NICs. Is there an issue with FreeBSD/HardenedBSD and the Intel's NIC of the APU2? Is there some other issue with FreeBSD/HardenedBSD not being able to turbo the AMD cpu at 1.4Ghz? Anything else?
We post on these forums to get (we hope) some answers from the devs themselves on some of the issues we encounters - like this one. So please, dont turn into that other company but instead maybe forward the questions to the dev team so they can take a look.
Thank you for your comprehension.
For the community the "X is faster than Y, I just checked" is a waste of time if you don't say how "Y" goes from slower to faster. Even if you post OPNsense is faster than Z, I'm going to close this topic because just like in real life:
You measure your progress from where you were to where you are, you must not compare yourself to others because it is pointless and shallow.
Cheers,
Franco
Quote from: pjdouillard on February 13, 2020, 04:13:42 PM
Hello Franco,
I disagree as this isn't apples to oranges comparison, but as this thread is going on (started in July 2018 and still no resolution), comparing other firewalls with OPNsense running on the SAME hardware and saying what we are trying to solve the issue is the only thing we can do "on our side". And up to now, not a single dev produced some help in this thread as to why we might be having the issue or some path of resolution/explanation.
The PCEngine hardware is used by a lots of people around the world (privately and commercialy) since many years (before OPNsense was forked) and it provides a lot and fills a segment on the market that other commercial brands can't even achieve for the same price (reliability and low power usage). So we want to maximize our investment AND also use OPNsense because we like/prefer it over other firewalls. Trying to muzzle or threathened us by closing the thread isn't the right direction imo and isn't what I am expecting from the OPNsense forum - and is a reason many of us left "that other well known firewall" for OPNsense. We are not bitching but we are kind of fed up (in a way) by the lack of help or feedback by the guys who are making OPNsense.
So to be back on the thread itself, since other firewalls (Linux-based firewalls) are able to max the gigabit speed on any of the NIC of the APU2 from PCengine, we are all puzzled as to why OPNsense isn't capable of doing it. FreeBSD has the best TCP/IP stack of the *NIX out there so what is the problem?
We are not all Operating System developpers and thus are not equipped to check what's going on when a transfer is occuring on the APU2's NICs. Is there an issue with FreeBSD/HardenedBSD and the Intel's NIC of the APU2? Is there some other issue with FreeBSD/HardenedBSD not being able to turbo the AMD cpu at 1.4Ghz? Anything else?
We post on these forums to get (we hope) some answers from the devs themselves on some of the issues we encounters - like this one. So please, dont turn into that other company but instead maybe forward the questions to the dev team so they can take a look.
Thank you for your comprehension.
The reason why probably no dev answerd is that maybe none of the devs have either an APU or such a high bandwidth. Keep in mind that this is a community project. I for myself have only VDSL100 .. I have no idea how to help because I can't reproduce.
Maybe you can start with installing fresh pfsense, do a sysctl -a, output to file, do same for opnsense, and the diff them. Maybe pf has some other defaults.
Keep in mind that pfsense has about 100x bigger community, so the chance that one guy with an APU and enought knowledge to solve this and report the fix (not the problem) to upstream is 100x higher.
Quote from: mimugmail on February 13, 2020, 04:26:17 PM
The reason why probably no dev answerd is that maybe none of the devs have either an APU or such a high bandwidth. Keep in mind that this is a community project. I for myself have only VDSL100 .. I have no idea how to help because I can't reproduce.
Maybe you can start with installing fresh pfsense, do a sysctl -a, output to file, do same for opnsense, and the diff them. Maybe pf has some other defaults.
Keep in mind that pfsense has about 100x bigger community, so the chance that one guy with an APU and enought knowledge to solve this and report the fix (not the problem) to upstream is 100x higher.
Since you didn't read the whole thread, I will make it short for you:
-pfSense has the same problem on the same APU and no one in that community has found a fix - anything that is posted elsewhere has been tested and doesn't provide any REAL single-thread / single-stream solution.
-You don't need 1+ Gbps ISP bandwidth to recreate the problem: a local network with CAT5E ethernet cables will do the job between 2 physical PCs.
-If the devs don't have access to a PCEngine APU, I can send them one for free if they care to fix the problem.
https://www.max-it.de/kontakt/
Michael Muenz
Address above ...
Quote from: mimugmail on February 13, 2020, 05:54:31 PM
https://www.max-it.de/kontakt/
Michael Muenz
Address above ...
Will pm you.
First of all: I have absolutely no clue. Please Ignore this if I'm completely wrong.
Is it perhaps HardenedBSD related?
It might be tuning away from performance by using different defaults than other OS?
e.g.
https://bsdrp.net/documentation/technical_docs/performance#entropy_harvest_impact
Suggestts reducing kern.random.harvest.mask from 511 to 351 for performance gain.
OPNsense default seems to be 2047.
Now i take a look and see:
# sysctl kern.random
kern.random.harvest.mask: 67583
2^16+2047=67583
Some different Byte is set.
Tho i never tested 66047 nor 65887 nor 351.
And this thread almost a year ago:
https://forum.opnsense.org/index.php?topic=12058.0
more recently
https://forum.opnsense.org/index.php?topic=15686.msg71923#msg71923
Perhaps someone who understands this stuff can give advice how to tune?
Quote from: johnsmi on February 14, 2020, 05:21:24 AM
First of all: I have absolutely no clue. Please Ignore this if I'm completely wrong.
Is it perhaps HardenedBSD related?
It might be tuning away from performance by using different defaults than other OS?
e.g.
https://bsdrp.net/documentation/technical_docs/performance#entropy_harvest_impact
Suggestts reducing kern.random.harvest.mask from 511 to 351 for performance gain.
OPNsense default seems to be 2047.
Now i take a look and see:
# sysctl kern.random
kern.random.harvest.mask: 67583
2^16+2047=67583
Some different Byte is set.
Tho i never tested 66047 nor 65887 nor 351.
And this thread almost a year ago:
https://forum.opnsense.org/index.php?topic=12058.0
more recently
https://forum.opnsense.org/index.php?topic=15686.msg71923#msg71923
Perhaps someone who understands this stuff can give advice how to tune?
I am confident, nobody has the 100% reliably working solution for this problem.
Hi, has someone tested this with 20.7? Before the upgrade the results of various speed tests has shown nearly 270 MBit/s. After the upgrade it's 200 MBit/s only.
I have a 300 MBit/s FTTH PPPoE connection.
Are you using IPS/IDS?
No I've disabled it. But I use VLANs and PPPoE.
Recommended reading:
APU2 performance is insufficient for Gigabit if WAN is PPPoE
http://www.pcengines.info/forums/?page=post&id=E801CA38-8CD5-4854-95A7-99B67B5DB281&fid=DF5ACB70-99C4-4C61-AFA6-4C0E0DB05B2A&pageindex=1
Thanks. Has someone experiences with this?
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856#c11
Quote from: iam on September 23, 2020, 11:39:55 PM
Thanks. Has someone experiences with this?
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856#c11
In the last post in that topic, the freebsd maintaner Eugene Grosbein closed the bug with "Closed Works As Intended", so they dont acknowledge this is a bug, and seems will never be fixed. You have to forget Pcengines APU2 for PPPoE WAN and 1Gbit. Unless PCengines release a new PCB design with a much stronger (2Ghz+) embedded APU.
Yes, I've set "net.isr.maxthreads" and "net.isr.numthreads" to the number of cores (4) and net.isr.dispatch to "deferred". This led to a a slight performance increase (~10%). I will now try to offload PPPoE stuff from the firewall to the modem (my modem has this option) and see what happens.
Just for testing today I deactivated my PPPoE interface in OPNsense and - guess what - the performance on the other two (non-PPPoE) interfaces DOUBLED instantly after a reboot (sorry to say that after reactivating PPPoE the "boost" went away - even without rebooting).
So it looks like you can achieve 1GBit speed only if you don't use PPPoE - regardless of the FW OS used (forget about the others, they will have the same problem).
I bought a DrayTek 165 (VDSL2+ 35b) modem now which is capable of handling the whole PPPoE stuff on its own. This way the OPNsense will only get IP traffic and it should finally work.
Quote from: telefonmann on September 24, 2020, 11:08:35 AM
Yes, I've set "net.isr.maxthreads" and "net.isr.numthreads" to the number of cores (4) and net.isr.dispatch to "deferred". This led to a a slight performance increase (~10%). I will now try to offload PPPoE stuff from the firewall to the modem (my modem has this option) and see what happens.
I can recommend these settings. Before the upgrade to 20.7 I had measured always values between 250 and 280 MBit/s. After the update I measured values between 180 and 220 MBit/s only. Now I measure values up to 313 MBit/s. Our contract says 300MBit/s, so that's really nice :)
I recommend these setting to
Hi,
Quote from: telefonmann on September 25, 2020, 02:55:53 PM
Just for testing today I deactivated my PPPoE interface in OPNsense and - guess what - the performance on the other two (non-PPPoE)
[...]
I bought a DrayTek 165 (VDSL2+ 35b) modem now which is capable of handling the whole PPPoE stuff on its own. This way the OPNsense will only get IP traffic and it should finally work.
I have a vigor 130, this should also work with that, could you please give a link on what to change in the config. Actually i do vlan tagging on the vigor only. Do the pppoe stuff at the modem with vlan tagging is perhaps a better method, would like to dive in, if it is a better setup for me.
Thanks,
Ronny
I wanted to try the same thing but it looks like the EU and US version of the Vigor 130 are actually different.
The US version can act as a router (enabling it to handle PPPoE auth and encapsulation itself), whereas the EU version doesn't have this functionality.
@rcmcronny, http://www.draytektr.com/documents/product/619F295B-B506-2CF1-D9115EA3B629181F.pdf
First set operation mode to "router mode", after this the device will reboot. Then just use the wizard (pages 7 ff. in the manual)
I am using an older APU1C model with only two cores.
I understand only one core is being used for routing.
Just a remark, testing network bandwidth with Ikoula Speedtest is not the right methodology, as it is very unaccurate.
Using Ikoula speedtest, my result was 440 Mbits/s downstream.
But the accurate speed measured with iperf3 is 571 Mbits/s
iperf3 -p 9222 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 9222
[ 5] local 10.90.20.1 port 60560 connected to 89.84.1.222 port 9222
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 69.1 MBytes 579 Mbits/sec 268 462 KBytes
[ 5] 1.00-2.00 sec 68.8 MBytes 577 Mbits/sec 0 567 KBytes
[ 5] 2.00-3.00 sec 67.5 MBytes 566 Mbits/sec 2 461 KBytes
[ 5] 3.00-4.00 sec 68.8 MBytes 577 Mbits/sec 0 563 KBytes
[ 5] 4.00-5.00 sec 67.5 MBytes 566 Mbits/sec 0 648 KBytes
[ 5] 5.00-6.00 sec 67.5 MBytes 566 Mbits/sec 2 544 KBytes
[ 5] 6.00-7.00 sec 70.0 MBytes 587 Mbits/sec 0 632 KBytes
[ 5] 7.00-8.00 sec 67.5 MBytes 566 Mbits/sec 2 533 KBytes
[ 5] 8.00-9.00 sec 67.5 MBytes 566 Mbits/sec 0 621 KBytes
[ 5] 9.00-10.00 sec 68.8 MBytes 577 Mbits/sec 2 510 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 683 MBytes 573 Mbits/sec 276 sender
[ 5] 0.00-10.00 sec 680 MBytes 571 Mbits/sec receiver
So I can confirm that the APU1 with older core and older NIC can achieve 571 Mbits downstream.
This is OPNsense latest version 20.7.
I am connecting from a GNU/Linux laptop using an RJ-45 wire and IPv4.
OPNsense is connected to the fiber router with and RJ-45 wire and IPv4 with NAT.
iperf3 also has an option to using multiple connection streams, which is -P 2 for two cores :
iperf3 -P 2 -p 9222 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 9222
[ 5] local 10.90.20.1 port 38612 connected to 89.84.1.222 port 9222
[ 7] local 10.90.20.1 port 38614 connected to 89.84.1.222 port 9222
^[[A[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 25.4 MBytes 213 Mbits/sec 12 229 KBytes
[ 7] 0.00-1.00 sec 46.0 MBytes 386 Mbits/sec 64 318 KBytes
[SUM] 0.00-1.00 sec 71.3 MBytes 598 Mbits/sec 76
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.00 sec 31.0 MBytes 260 Mbits/sec 0 314 KBytes
[ 7] 1.00-2.00 sec 37.2 MBytes 312 Mbits/sec 2 279 KBytes
[SUM] 1.00-2.00 sec 68.2 MBytes 572 Mbits/sec 2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 37.0 MBytes 311 Mbits/sec 1 290 KBytes
[ 7] 2.00-3.00 sec 32.3 MBytes 271 Mbits/sec 1 263 KBytes
[SUM] 2.00-3.00 sec 69.3 MBytes 582 Mbits/sec 2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-4.00 sec 34.8 MBytes 292 Mbits/sec 1 263 KBytes
[ 7] 3.00-4.00 sec 31.9 MBytes 268 Mbits/sec 1 245 KBytes
[SUM] 3.00-4.00 sec 66.7 MBytes 560 Mbits/sec 2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 34.2 MBytes 287 Mbits/sec 0 348 KBytes
[ 7] 4.00-5.00 sec 33.4 MBytes 280 Mbits/sec 1 239 KBytes
[SUM] 4.00-5.00 sec 67.6 MBytes 567 Mbits/sec 1
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 5.00-6.00 sec 39.6 MBytes 333 Mbits/sec 1 307 KBytes
[ 7] 5.00-6.00 sec 28.5 MBytes 239 Mbits/sec 2 226 KBytes
[SUM] 5.00-6.00 sec 68.1 MBytes 571 Mbits/sec 3
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 6.00-7.00 sec 39.3 MBytes 330 Mbits/sec 0 389 KBytes
[ 7] 6.00-7.00 sec 30.0 MBytes 251 Mbits/sec 0 311 KBytes
[SUM] 6.00-7.00 sec 69.3 MBytes 581 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 7.00-8.00 sec 36.5 MBytes 306 Mbits/sec 1 355 KBytes
[ 7] 7.00-8.00 sec 30.9 MBytes 259 Mbits/sec 1 305 KBytes
[SUM] 7.00-8.00 sec 67.4 MBytes 565 Mbits/sec 2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 8.00-9.00 sec 36.7 MBytes 308 Mbits/sec 1 329 KBytes
[ 7] 8.00-9.00 sec 32.0 MBytes 268 Mbits/sec 1 293 KBytes
[SUM] 8.00-9.00 sec 68.7 MBytes 577 Mbits/sec 2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 9.00-10.00 sec 35.2 MBytes 295 Mbits/sec 1 305 KBytes
[ 7] 9.00-10.00 sec 31.9 MBytes 268 Mbits/sec 1 279 KBytes
[SUM] 9.00-10.00 sec 67.1 MBytes 563 Mbits/sec 2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 350 MBytes 293 Mbits/sec 18 sender
[ 5] 0.00-10.01 sec 348 MBytes 292 Mbits/sec receiver
[ 7] 0.00-10.00 sec 334 MBytes 280 Mbits/sec 74 sender
[ 7] 0.00-10.01 sec 331 MBytes 278 Mbits/sec receiver
[SUM] 0.00-10.00 sec 684 MBytes 574 Mbits/sec 92 sender
[SUM] 0.00-10.01 sec 679 MBytes 569 Mbits/sec receiver
But this gave me the same results (probably a limitation of my hardware NICs ?).
IMHO, you should make a test using iperf3 for accurate results.
iperf3 should be running on client, not directly on OPNsense of course.
Edit : iperf3 should be used with -R option to ask the server to send information, otherwize you are testing upload speed. My upload speed is around 600Mbits/s, so I need to retest downloading with -R option.
Quote from: dave on September 30, 2020, 06:44:21 PM
I wanted to try the same thing but it looks like the EU and US version of the Vigor 130 are actually different.
The US version can act as a router (enabling it to handle PPPoE auth and encapsulation itself), whereas the EU version doesn't have this functionality.
Nope, the "EU-version" of the Vigor 130 (if there is such a thing) can act as router or modem (bridged mode), but as it is most often used as modem, it comes pre-configured in modem mode. Let the sense do the PPPoE and VLAN is the preferred configuration.
Here is a more accurate downloading speedtest with iperf3 and one thread.
I used -P option to test downloading speed, not uploading :
iperf3 -R -P 1 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5201
Reverse mode, remote host bouygues.iperf.fr is sending
[ 5] local 10.90.20.1 port 39286 connected to 89.84.1.222 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 53.7 MBytes 450 Mbits/sec
[ 5] 1.00-2.00 sec 52.8 MBytes 443 Mbits/sec
[ 5] 2.00-3.00 sec 55.3 MBytes 464 Mbits/sec
[ 5] 3.00-4.00 sec 61.4 MBytes 515 Mbits/sec
[ 5] 4.00-5.00 sec 54.3 MBytes 456 Mbits/sec
[ 5] 5.00-6.00 sec 53.0 MBytes 445 Mbits/sec
[ 5] 6.00-7.00 sec 53.8 MBytes 451 Mbits/sec
[ 5] 7.00-8.00 sec 48.3 MBytes 405 Mbits/sec
[ 5] 8.00-9.00 sec 54.6 MBytes 458 Mbits/sec
[ 5] 9.00-10.00 sec 54.4 MBytes 457 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 550 MBytes 461 Mbits/sec 31826 sender
[ 5] 0.00-10.00 sec 542 MBytes 455 Mbits/sec receiver
However, with two threads, I have the same results :
Quoteiperf3 -R -P 2 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5201
Reverse mode, remote host bouygues.iperf.fr is sending
[ 5] local 10.90.20.1 port 40064 connected to 89.84.1.222 port 5201
[ 7] local 10.90.20.1 port 40066 connected to 89.84.1.222 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 28.8 MBytes 241 Mbits/sec
[ 7] 0.00-1.00 sec 25.7 MBytes 216 Mbits/sec
[SUM] 0.00-1.00 sec 54.5 MBytes 457 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.00 sec 27.5 MBytes 231 Mbits/sec
[ 7] 1.00-2.00 sec 27.6 MBytes 232 Mbits/sec
[SUM] 1.00-2.00 sec 55.1 MBytes 462 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 23.6 MBytes 198 Mbits/sec
[ 7] 2.00-3.00 sec 29.4 MBytes 246 Mbits/sec
[SUM] 2.00-3.00 sec 53.0 MBytes 444 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-4.00 sec 23.8 MBytes 200 Mbits/sec
[ 7] 3.00-4.00 sec 26.9 MBytes 226 Mbits/sec
[SUM] 3.00-4.00 sec 50.7 MBytes 426 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 27.3 MBytes 229 Mbits/sec
[ 7] 4.00-5.00 sec 23.7 MBytes 199 Mbits/sec
[SUM] 4.00-5.00 sec 51.0 MBytes 428 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 5.00-6.00 sec 19.8 MBytes 166 Mbits/sec
[ 7] 5.00-6.00 sec 30.2 MBytes 253 Mbits/sec
[SUM] 5.00-6.00 sec 50.0 MBytes 419 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 6.00-7.00 sec 16.9 MBytes 142 Mbits/sec
[ 7] 6.00-7.00 sec 34.6 MBytes 290 Mbits/sec
[SUM] 6.00-7.00 sec 51.5 MBytes 432 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 7.00-8.00 sec 16.9 MBytes 142 Mbits/sec
[ 7] 7.00-8.00 sec 34.0 MBytes 285 Mbits/sec
[SUM] 7.00-8.00 sec 50.8 MBytes 426 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 8.00-9.00 sec 15.8 MBytes 133 Mbits/sec
[ 7] 8.00-9.00 sec 38.1 MBytes 320 Mbits/sec
[SUM] 8.00-9.00 sec 53.9 MBytes 452 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 9.00-10.00 sec 14.2 MBytes 119 Mbits/sec
[ 7] 9.00-10.00 sec 40.1 MBytes 336 Mbits/sec
[SUM] 9.00-10.00 sec 54.3 MBytes 455 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 219 MBytes 183 Mbits/sec 21640 sender
[ 5] 0.00-10.00 sec 215 MBytes 180 Mbits/sec receiver
[ 7] 0.00-10.01 sec 315 MBytes 264 Mbits/sec 29816 sender
[ 7] 0.00-10.00 sec 310 MBytes 260 Mbits/sec receiver
[SUM] 0.00-10.01 sec 534 MBytes 447 Mbits/sec 51456 sender
[SUM] 0.00-10.00 sec 525 MBytes 440 Mbits/sec receiver
I also testing on local server (connected on different VLAN with different subnets, so OPNsense is acting as NAT):
Quoteiperf3 -R -c 10.90.70.250
Connecting to host 10.90.70.250, port 5201
Reverse mode, remote host 10.90.70.250 is sending
[ 5] local 10.90.20.1 port 54348 connected to 10.90.70.250 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 85.7 MBytes 719 Mbits/sec
[ 5] 1.00-2.00 sec 86.8 MBytes 728 Mbits/sec
[ 5] 2.00-3.00 sec 86.3 MBytes 724 Mbits/sec
[ 5] 3.00-4.00 sec 85.8 MBytes 720 Mbits/sec
[ 5] 4.00-5.00 sec 84.9 MBytes 712 Mbits/sec
[ 5] 5.00-6.00 sec 81.9 MBytes 687 Mbits/sec
[ 5] 6.00-7.00 sec 88.6 MBytes 744 Mbits/sec
[ 5] 7.00-8.00 sec 87.6 MBytes 735 Mbits/sec
[ 5] 8.00-9.00 sec 85.7 MBytes 719 Mbits/sec
[ 5] 9.00-10.00 sec 87.3 MBytes 732 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 864 MBytes 724 Mbits/sec 19 sender
[ 5] 0.00-10.00 sec 861 MBytes 722 Mbits/sec receiver
Here I can achieve 722 Mbits, which is pretty good for an older APU1c platform.
Same results with 2 threads.
Two remarks:
1) I cannot explain why iperf3 is so much faster on a local iperf3 server with NAT.
2) OPNsense does not seem to support multiple core routing, as speed is not higher with two treads.
I even tested with two clients and there is roughly the same speed.
Do I miss something in my OPNsense settings?
I would expect speed to be higher with two iperf3 threads.
Or is pf single threaded on OPNsense?
Quote from: chemlud on November 18, 2020, 11:01:41 AM
Nope, the "EU-version" of the Vigor 130 (if there is such a thing) can act as router or modem (bridged mode), but as it is most often used as modem, it comes pre-configured in modem mode. Let the sense do the PPPoE and VLAN is the preferred configuration.
I emailed Draytek:
QuoteRegarding the authentication, the DrayTek UK Vigor 130 was designed to support bridge mode out of the box. You can consider the Vigor 2762 series that can handle PPP authentication.
They didn't outright say no I guess.
Quote from: dave on November 19, 2020, 07:18:18 PM
I emailed Draytek:
QuoteRegarding the authentication, the DrayTek UK Vigor 130 was designed to support bridge mode out of the box. You can consider the Vigor 2762 series that can handle PPP authentication.
They didn't outright say no I guess.
Have you tried not to use the BT firmware but the alternative one?
Nope. Have you?
Any idea why speed is higher with a local iperf3 server on a different subnet.
I tested with IPv6 (no NAT) to make sure no NAT was used, the same difference applies:
Client : Linux laptop
Server : bouygues.iperf.fr
Firewall : APU1c OPNsense 20.7 latest.
WAN connected to Gig fiber.
Same results for IPv4 and IPv6, so NAT is not the issue.
Quoteiperf3 -6 -R -p 5206 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5206
Reverse mode, remote host bouygues.iperf.fr is sending
[ 5] local 2a01:e0a:2ed:6231:b11b:ac7c:1c41:b3f7 port 53940 connected to 2001:860:deff:1000::2 port 5206
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 54.9 MBytes 461 Mbits/sec
[ 5] 1.00-2.00 sec 59.2 MBytes 497 Mbits/sec
[ 5] 2.00-3.00 sec 55.5 MBytes 466 Mbits/sec
[ 5] 3.00-4.00 sec 53.1 MBytes 446 Mbits/sec
[ 5] 4.00-5.00 sec 52.6 MBytes 442 Mbits/sec
[ 5] 5.00-6.00 sec 55.4 MBytes 465 Mbits/sec
[ 5] 6.00-7.00 sec 53.0 MBytes 445 Mbits/sec
[ 5] 7.00-8.00 sec 51.9 MBytes 435 Mbits/sec
[ 5] 8.00-9.00 sec 49.0 MBytes 411 Mbits/sec
[ 5] 9.00-10.00 sec 58.2 MBytes 488 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 549 MBytes 460 Mbits/sec 40855 sender
[ 5] 0.00-10.00 sec 543 MBytes 455 Mbits/sec receiver
Client : Linux laptop
Server : Another APU1c running Debian Linux on a separate isolated VLAN (firewall is routing).
Firewall : APU1c OPNsense 20.7 latest routing between VLANs.
iperf3 -R -c 10.90.70.250
Connecting to host 10.90.70.250, port 5201
Reverse mode, remote host 10.90.70.250 is sending
[ 5] local 10.90.20.1 port 56430 connected to 10.90.70.250 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 87.1 MBytes 731 Mbits/sec
[ 5] 1.00-2.00 sec 87.9 MBytes 737 Mbits/sec
[ 5] 2.00-3.00 sec 82.2 MBytes 689 Mbits/sec
[ 5] 3.00-4.00 sec 83.5 MBytes 701 Mbits/sec
[ 5] 4.00-5.00 sec 88.2 MBytes 740 Mbits/sec
[ 5] 5.00-6.00 sec 87.2 MBytes 731 Mbits/sec
[ 5] 6.00-7.00 sec 87.7 MBytes 736 Mbits/sec
[ 5] 7.00-8.00 sec 82.8 MBytes 695 Mbits/sec
[ 5] 8.00-9.00 sec 88.4 MBytes 741 Mbits/sec
[ 5] 9.00-10.00 sec 90.4 MBytes 758 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 869 MBytes 728 Mbits/sec 577 sender
[ 5] 0.00-10.00 sec 865 MBytes 726 Mbits/sec receiver
This is not clear to me why there is such a different.
Why is routing between VLANs wit firewall so much faster than routing with IPv6 and gigabyte fiber?
To confirm: my VLANs are not communicating directly on switch.
When testing on the same VLAN (so OPNsense does nothing):
iperf3 -R -c 10.90.70.250
Connecting to host 10.90.70.250, port 5201
Reverse mode, remote host 10.90.70.250 is sending
[ 5] local 10.90.70.110 port 42160 connected to 10.90.70.250 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 93.5 MBytes 784 Mbits/sec
[ 5] 1.00-2.00 sec 93.6 MBytes 785 Mbits/sec
[ 5] 2.00-3.00 sec 93.6 MBytes 786 Mbits/sec
[ 5] 3.00-4.00 sec 94.2 MBytes 790 Mbits/sec
[ 5] 4.00-5.00 sec 95.8 MBytes 803 Mbits/sec
[ 5] 5.00-6.00 sec 95.1 MBytes 798 Mbits/sec
[ 5] 6.00-7.00 sec 95.8 MBytes 803 Mbits/sec
[ 5] 7.00-8.00 sec 96.1 MBytes 806 Mbits/sec
[ 5] 8.00-9.00 sec 95.9 MBytes 805 Mbits/sec
[ 5] 9.00-10.00 sec 96.1 MBytes 806 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 950 MBytes 797 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 950 MBytes 797 Mbits/sec receiver
This is close to the speed of inter-VLAN routing with OPNsense.
So OPNsense is very efficient in inter-VLAN routing.
And just to confirm, speed with direct link is close to 1Gb/s:
iperf3 -R -p 5206 -c bouygues.iperf.fr
Connecting to host bouygues.iperf.fr, port 5206
Reverse mode, remote host bouygues.iperf.fr is sending
[ 5] local 192.168.1.158 port 58658 connected to 89.84.1.222 port 5206
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 111 MBytes 930 Mbits/sec
[ 5] 1.00-2.00 sec 112 MBytes 941 Mbits/sec
[ 5] 2.00-3.00 sec 112 MBytes 942 Mbits/sec
[ 5] 3.00-4.00 sec 112 MBytes 941 Mbits/sec
[ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec
[ 5] 5.00-6.00 sec 112 MBytes 941 Mbits/sec
[ 5] 6.00-7.00 sec 112 MBytes 942 Mbits/sec
[ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec
[ 5] 8.00-9.00 sec 112 MBytes 942 Mbits/sec
[ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 946 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver
OK, I get it. When connecting to the Internet, data is going through WAN on a different network interface. However it is not clear why it is SO much slower than inter-VLAN routing with OPNsense firewall.
Good evening!
I have an apu2 with 4 NICs and 1000 MBit/s from my ISP (Cable-Modem - bridge mode - opnsense/apu2). I get ~ 650 MBit/s, which gives a gap of ~ 250 MBit/s i pay, but i don't get...
My thoughts to have apu2 + opnsense = 1000 MBit/s:
- use a switch with link aggregation for - let's say - 2 ports for WAN and another 2 ports for LAN
- Have this two trunks each in a different vlan on that switch, VLAN 1 = LAN, VLAN 2 = WAN
- create a lagg "lagg0" with 2 ports (igb2 + igb3) on opnsens' side and assign the lagg to "WAN"
- do the same for "lagg1" with igb0 + igb1, assign to "LAN"
- connect the bridge-mode-cable-modem to another port at the switch with configured VLAN 2 (WAN)
- connect all LAN CLients to ports with VLAN 1 (LAN)
Would this solve the problem or do i have an error in reasoning?
Maybe someone tested in this direction?
Alex
I struggle the same issue. My setup is opnsense 21.1 virtualized as XEN HVM. I'Ve got virtual switches for different vlans. Opnsense just uses plain interfaces. Only vlan is for pppoe Internet uplink.
iperf gives me something like 19 Gbps domU (= virtual machine) to domU in the same vlan. Through opnsense this goes down as to ~700 Mbps. LAN-device on a physical switch to a domU is something around ~300 Mbps. Internet with one single session is somehow limited to around ~250 Mbps.
I checked some of the suggestions from https://bsdrp.net/documentation/technical_docs/performance (https://bsdrp.net/documentation/technical_docs/performance), namely
net.inet6.ip6.redirect = 0
net.inet.ip.redirect was already 0
hw.igb.rx_process_limit = -1
hw.igb.tx_process_limit = -1
kern.random.harvest.mask = 351
It seems to tune the performance a bit - but still need to investigate that further.
Internet goes up to ~400 Mbps.
LAN - domU ~510 Mbps
domU - domU stays around ~700 Mbps.
Found some other BSD relating optimization stuff. need to look into it.
https://calomel.org/network_performance.html (https://calomel.org/network_performance.html)
https://calomel.org/freebsd_network_tuning.html (https://calomel.org/freebsd_network_tuning.html)
Quote from: spi39492 on February 07, 2021, 12:04:21 PM
Found some other BSD relating optimization stuff. need to look into it.
https://calomel.org/network_performance.html (https://calomel.org/network_performance.html)
https://calomel.org/freebsd_network_tuning.html (https://calomel.org/freebsd_network_tuning.html)
Playing around with
net.inet.ip.ifq.maxlen
kern.ipc.maxsockbuf
net.inet.tcp.recvbuf_inc
net.inet.tcp.recvbuf_max
net.inet.tcp.recvspace
net.inet.tcp.sendbuf_inc
net.inet.tcp.sendbuf_max
net.inet.tcp.sendspace
net.inet.tcp.tso
doesn't seem to change anything. Not sure exactly, if all settings get set correctly, sysctl -a doesn't show them all.
A quick check with openwrt 19.07 as domU on same server gives LAN - domU 890 Mbps. So almost Gbit wire speed.
With a slight modification - e1000 NICs as virtualized network adapters I get
- LAN - domU wire speed of 940 Mbps
- domU - domU 7700 Mbps (7.7 Gbps)
I think you are mixing two things in this thread:
This thread is about the optimization of APU-based hardware devices, which can only do 1GBit/s when specifically optimized on FreeBSD.
The other issue could be performance problems of 21.1 on XEN based virtualization at best. There are already more participants here in the forum with this observation.
I would rather not discuss the XEN issue in this APU thread, as you are more likely to meet users who are also concerned.
Quote from: thowe on February 07, 2021, 04:11:43 PM
I think you are mixing two things in this thread:
This thread is about the optimization of APU-based hardware devices, which can only do 1GBit/s when specifically optimized on FreeBSD.
The other issue could be performance problems of 21.1 on XEN based virtualization at best. There are already more participants here in the forum with this observation.
I would rather not discuss the XEN issue in this APU thread, as you are more likely to meet users who are also concerned.
Thanks, I was about to ask the same thing (dont mix 2 different things in 1 thread)
Quote from: thowe on February 07, 2021, 04:11:43 PM
I think you are mixing two things in this thread:
This thread is about the optimization of APU-based hardware devices, which can only do 1GBit/s when specifically optimized on FreeBSD.
The other issue could be performance problems of 21.1 on XEN based virtualization at best. There are already more participants here in the forum with this observation.
I would rather not discuss the XEN issue in this APU thread, as you are more likely to meet users who are also concerned.
Understood that this is specifically on APU-based boards. I observe also performnce issues and couldn't find anything somehow related for Xen. That's why I am interested in your observations - I'd give the performance tuning tips a try.
Start with e.g. these (from this thread):
net.inet6.ip6.redirect = 0
net.inet.ip.redirect = 0
hw.igb.rx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)
hw.igb.tx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)
Quote from: thowe on February 08, 2021, 02:58:22 PM
Start with e.g. these (from this thread):
net.inet6.ip6.redirect = 0
net.inet.ip.redirect = 0
hw.igb.rx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)
hw.igb.tx_process_limit = -1 (these are hardware dependent and will probably not match your NIC in the VM)
Thx - have these. Helped me to increase the speed (as mentioned in one of my posts). But still far away from Gbit.
Switch to a Odroid H2+, it achieves Gigabit with no issue.
I gonna sell my APU Board
Quote from: mater on February 09, 2021, 06:41:06 AM
Switch to a Odroid H2+, it achieves Gigabit with no issue.
Realtek LAN? Sorry, I'll pass. Yes, I see the problem with the apu, but Odroid is not the solution, IMHO.
Protectli looks good ... a bit difficult to find in Europe, though.
Quote from: pmhausen on February 09, 2021, 07:39:46 AM
Quote from: mater on February 09, 2021, 06:41:06 AM
Switch to a Odroid H2+, it achieves Gigabit with no issue.
Realtek LAN? Sorry, I'll pass. Yes, I see the problem with the apu, but Odroid is not the solution, IMHO.
Protectli looks good ... a bit difficult to find in Europe, though.
I am no fan of realtek either (that lame company has still bad reputation in 2021 when that really came from the 90s mid 00s), but just because it has a shiny intel logo stick on it, doesnt mean it will have superb performance. Just looking at the fancy i225-V (B1 was broken design and promised B2 is still faulty, now even newer B3 fixed it really) fiasco with 2,5Gbit is literally broken and the workaround was to switch back to 1Gbit max. While the black sheep realtek RTL8125BG just works.---
I have been struggling with performance on the APU4. While in initial testing, I was able to get around 700MBit/s with 2 iperf3 streams, with my fully configured firewall rule set (but minimal rules for the actual path I am testing), I am now down to around 250MBit/s and can't get it higher.
Settings from this thread, from https://www.reddit.com/r/homelab/comments/fciqid/hardware_for_my_home_pfsense_router/fjfl8ic/, and from https://teklager.se/en/knowledge-base/opnsense-performance-optimization/ have all been applied, and I am not sure when the performance drop occurred.
What is the best way to debug what's going on here? This is quite frustrating, as I know the hardware to be capable of full GBit/s routing.
Just to be sure:
- You test with iperf THROUGH the firewall. I.e. iperf is not running on the firewall but on separate hosts "on both sides" of the firewall?
- You have only set pf rules but no other services like IDS, IDP, Sensei etc.
Something you can check: configure powerd to use "Maximum" instead of "Hiadaptive".
As discussed here: https://forum.opnsense.org/index.php?topic=21194.msg99228#msg99228
If you use "Maximum", anyway, just disable powerd.
Are there any penalties from not using powerd (otherwise said to not enable the powerd service)? I assume some hidden bugs may surface, if some system strongly assumes powerd is an always present component. Or at least I would be more careful to say too quickly that powerd is not necessary at all, and doesnt cause any issues.
NAME
powerd – system power control utility
SYNOPSIS
powerd [-a mode] [-b mode] [-i percent] [-m freq] [-M freq] [-N]
[-n mode] [-p ival] [-P pidfile] [-r percent] [-s source] [-v]
DESCRIPTION
The powerd utility monitors the system state and sets various power
control options accordingly. It offers power-saving modes that can be
individually selected for operation on AC power or batteries.
[...]
Powerd's only function in FreeBSD is to set the CPU to power saving modes when idle. There is nothing else that depends on powerd. You do not need to run it.
Somebody complained that their APU2 CPU got stuck at 600Mhz, and had to actually enable the powerd to force CPU clockspeed go over that 600Mhz.
As I said, the behavior of a system may not be easy to predict, even if the theory says something, reality may be something different.
Quote from: Ricardo on April 13, 2021, 03:25:35 PM
Are there any penalties from not using powerd (otherwise said to not enable the powerd service)? I assume some hidden bugs may surface, if some system strongly assumes powerd is an always present component. Or at least I would be more careful to say too quickly that powerd is not necessary at all, and doesnt cause any issues.
Disabling PowerD and enable the core performance boost via the bios will lock the cores at 1.4Ghz.
The APU's are only ~10w devices, so you don't need to worry about power savings \ heat.
Update: I found the culprit for the drop of more than 1/3 in throughput: just enabling IPSec (with configured tunnels up and running) drops locally routed performance from 750-800Mbps to 500Mbps for traffic that doesn't go through the tunnel. This is using IPSec policies and not with a virtual network device.
And just to confirm: yes, there are two hosts on different sides of the box, one iperf3 server, one client.
coreboot has been updated to the latest available version. PowerD is running and normally set to Hiadaptive as I actually want to save some power for most of the time when there is little traffic. A quick comparison doesn't seem to show a measurable difference between Hiadaptive and Maximum, though performance drops when I disable PowerD altogether (probably confirming the suspicion that the CPU is stuck at 600MHz without it running).
Further datapoints: Having flowd_aggregate running (with all local VLAN interfaces monitored) drops around 50Mbps throughput when samplicate is stopped and about 250Mbps when both are running. But this part is - if not good - than at least explainable, as it certainly adds CPU load. The IPSec related throughput drop for streams not hitting IPSec tunnels (which stacks with the netflow drop, i.e. when both are enabled, I only average around 250Mbps total throughput) is what puzzles me.
check this forum for the "APU2 stuck at 600Mhz" issue:
https://github.com/pcengines/coreboot/issues/457
Regarding the policy based ipsec enablement immediately halves the throughput even if the traffic is bypassing the vpn tunnel, is very concerning. I also have some policy based vpn tunnels, so it may further limit my WAN speed, even if that traffic is not getting routed into the vpn tunnel. Big mess, I have to say, and years can pass by without resolution :(
Quote from: Ricardo on April 13, 2021, 04:06:49 PM
Somebody complained that their APU2 CPU got stuck at 600Mhz, and had to actually enable the powerd to force CPU clockspeed go over that 600Mhz.
As I said, the behavior of a system may not be easy to predict, even if the theory says something, reality may be something different.
I was involved in that discussion and IMHO we came to the conclusion that you needed to
disable powerd to get beyond 600 MHz. That's precisely why I recommend that.
I have opened the ticket with the maintainers of Coreboot for the PC Engines boards. So far, however, there has been nothing further.
I was able to solve the problem by keeping PowerD enabled but setting the mode to maximum. Since then I didn't have the problem anymore.
And if the CPU was limited to 600MHz it cost a lot of performance. In my setup I just got away with it - but had zero headroom. When the CPU is running normally the utilization is rarely more than 50%.
Quote from: Ricardo on April 15, 2021, 03:29:45 PM
Regarding the policy based ipsec enablement immediately halves the throughput even if the traffic is bypassing the vpn tunnel, is very concerning. I also have some policy based vpn tunnels, so it may further limit my WAN speed, even if that traffic is not getting routed into the vpn tunnel. Big mess, I have to say, and years can pass by without resolution :(
Indeed. This happens not only for LAN->WAN traffic, but also for traffic between two different internal (e.g. LAN and DMZ) segments with no NAT involved and only directly connected routes in use. I have not yet tried with VTI instead of policy based IPsec, but this issue may make OpnSense a non-starter for the intended production use at our university institute (that is the reason why I am now spending far too much time putting OpnSense through such tests).
Quote from: rmayr on April 15, 2021, 05:48:44 PM
Quote from: Ricardo on April 15, 2021, 03:29:45 PM
Regarding the policy based ipsec enablement immediately halves the throughput even if the traffic is bypassing the vpn tunnel, is very concerning. I also have some policy based vpn tunnels, so it may further limit my WAN speed, even if that traffic is not getting routed into the vpn tunnel. Big mess, I have to say, and years can pass by without resolution :(
Indeed. This happens not only for LAN->WAN traffic, but also for traffic between two different internal (e.g. LAN and DMZ) segments with no NAT involved and only directly connected routes in use. I have not yet tried with VTI instead of policy based IPsec, but this issue may make OpnSense a non-starter for the intended production use at our university institute (that is the reason why I am now spending far too much time putting OpnSense through such tests).
You really want to run a university institute in production with a APU device?? :o
Quote from: mimugmail on April 15, 2021, 08:51:22 PM
Quote from: rmayr on April 15, 2021, 05:48:44 PM
Indeed. This happens not only for LAN->WAN traffic, but also for traffic between two different internal (e.g. LAN and DMZ) segments with no NAT involved and only directly connected routes in use. I have not yet tried with VTI instead of policy based IPsec, but this issue may make OpnSense a non-starter for the intended production use at our university institute (that is the reason why I am now spending far too much time putting OpnSense through such tests).
You really want to run a university institute in production with a APU device?? :o
No, not on an APU - this is my test device to find some of the issues in parallel to a VM installation (which seems to have the same performance issues, actually). We would only put it in production on a faster hardware, but don't expect such bottlenecks to necessarily change. We are aiming for at least 2-3, better 5Gbps throughput between some of the segments, and definitely need IPsec and flow analysis and would like (but don't necessarily require) IDS/IPS on. Given our current experience, I am not sure how likely that is.
Get a Deciso DEC38xx and you will definitely be able to match that requirement.
Further tests on an 8-core ProxMox VM server with 4 cores assigned to a OpnSense test instance shows 1.6 Gbps throughput limit with the CPU not fully loaded (only 2 out of 4 cores in the VM being used). Putting traffic flow analysis and Surricata into the mix, I am not sure how a hardware like the one sold by Decisio would reach 5 Gbps with the current OpnSense version. What is the big difference we are missing?
The best solution is to get a written(!) assurance from Deciso, what traffic their hardware can do. That way you can demand the promised performance for your money, if it turns out thweir hardware underperforms. Otherwise any vendor on the planet can say literally anything they are not shy to say. As you cant depend on generic marketing PDFs.
The concerning bit is the heavy side effect of having IPsec enabled for completely unrelated traffic. It points to a general performance bottleneck in the kernel.
Now with more detailed measurements: https://www.mayrhofer.eu.org/post/firewall-throughput-opnsense-openwrt/
For my experience, with all my APU (4D4, 4C4, ...), I've 850Mb/s only if I use several ethernet packets simultaneously (option -P 2 or -P 4 with iperf)
Otherwise, performance is critical (450/500Mb/s) with 1 packet (iperf default mode test).
I've all upgraded... OPNSense 21.7.5, Coreboot 4.14.0.6
And I've added this parameters :
- hw.igb.rx_process_limit="-1"
- hw.igb.tx_process_limit="-1"
- legal.intel_igb.license_ack="1"
- net.inet.tcp.tso="1"
- net.inet.udp.checksum="1"
- hint.p4tcc.0.disabled=1
- hint.acpi_throttle.0.disabled=1
- hint.acpi_perf.0.disabled=1
@ProServ thanks for sharing this.
Have you had to opportunity to retest with 21.7.5? I ask because unless I'm mistaken some of those tunables have gone from the kernel. So if the testing shows the same results then it would be intersting to see if they remain after commenting them out.
For instance I can't find sysctl tunables for hw.igb.rx_process_limit or hw.igb.{number}.rx_process.
The same for hint.p4tcc.0.disabled
Has someone else tested 22.1 already. I'm now getting much higher speed values on PPPoE WAN:
https://forum.opnsense.org/index.php?topic=26162.msg128661#msg128661
Most of the suggested tunables are not supported any more in 22.1.
I have however not yet tested network performance in 22.1 yet.
Tunables dev.igb.0.fc, dev.igb.1.fc,... are still shown as valid tunables.
These are shown as unsupported:
dev.igb.0.eee_disabled, dev.igb.1.eee_disabled, ...
hint.acpi_perf.0.disabled
hint.acpi_throttle.0.disabled
hint.p4tcc.0.disabled
hw.igb.0.fc, hw.igb.1.fc, ...
hw.igb.num_queues
hw.igb.rx_process_limit
hw.igb.tx_process_limit
legal.intel_igb.license_ack
I have removed the flow control tunables, as the network speed was minimally faster.
I got 350/210 MBit/s with ipferf in both directions (including the use of vlans) and IDS/IPS off.
One way got me 390 MBit/s.
With IDS (no IPS, because that is broken for me in 22.1):
one way: 300 MBit/s
On LAN side 22.1 APU2D4, Gigabit network. All non-functional tunables removed as mentioned by @fireburner, no IDS/IPS.
I recall having measured higher values on 21.x (~800 - 900 MBit/s)
--------@DiskStation:/$ iperf3 -c 192.168.1.1 -p 19160 -P 30 -4 -R
Connecting to host 192.168.1.1, port 19160
Reverse mode, remote host 192.168.1.1 is sending
[ 5] local 192.168.1.10 port 43300 connected to 192.168.1.1 port 19160
[ 7] local 192.168.1.10 port 43302 connected to 192.168.1.1 port 19160
[ 9] local 192.168.1.10 port 43304 connected to 192.168.1.1 port 19160
[ 11] local 192.168.1.10 port 43310 connected to 192.168.1.1 port 19160
[ 13] local 192.168.1.10 port 43312 connected to 192.168.1.1 port 19160
[ 15] local 192.168.1.10 port 43314 connected to 192.168.1.1 port 19160
[ 17] local 192.168.1.10 port 43316 connected to 192.168.1.1 port 19160
[ 19] local 192.168.1.10 port 43318 connected to 192.168.1.1 port 19160
[ 21] local 192.168.1.10 port 43320 connected to 192.168.1.1 port 19160
[ 23] local 192.168.1.10 port 43322 connected to 192.168.1.1 port 19160
[ 25] local 192.168.1.10 port 43324 connected to 192.168.1.1 port 19160
[ 27] local 192.168.1.10 port 43326 connected to 192.168.1.1 port 19160
[ 29] local 192.168.1.10 port 43328 connected to 192.168.1.1 port 19160
[ 31] local 192.168.1.10 port 43330 connected to 192.168.1.1 port 19160
[ 33] local 192.168.1.10 port 43332 connected to 192.168.1.1 port 19160
[ 35] local 192.168.1.10 port 43334 connected to 192.168.1.1 port 19160
[ 37] local 192.168.1.10 port 43336 connected to 192.168.1.1 port 19160
[ 39] local 192.168.1.10 port 43338 connected to 192.168.1.1 port 19160
[ 41] local 192.168.1.10 port 43344 connected to 192.168.1.1 port 19160
[ 43] local 192.168.1.10 port 43346 connected to 192.168.1.1 port 19160
[ 45] local 192.168.1.10 port 43352 connected to 192.168.1.1 port 19160
[ 47] local 192.168.1.10 port 43354 connected to 192.168.1.1 port 19160
[ 49] local 192.168.1.10 port 43356 connected to 192.168.1.1 port 19160
[ 51] local 192.168.1.10 port 43358 connected to 192.168.1.1 port 19160
[ 53] local 192.168.1.10 port 43360 connected to 192.168.1.1 port 19160
[ 55] local 192.168.1.10 port 43362 connected to 192.168.1.1 port 19160
[ 57] local 192.168.1.10 port 43364 connected to 192.168.1.1 port 19160
[ 59] local 192.168.1.10 port 43366 connected to 192.168.1.1 port 19160
[ 61] local 192.168.1.10 port 43368 connected to 192.168.1.1 port 19160
[ 63] local 192.168.1.10 port 43370 connected to 192.168.1.1 port 19160
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.66 sec 19.6 MBytes 15.4 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 18.9 MBytes 15.8 Mbits/sec receiver
[ 7] 0.00-10.66 sec 18.0 MBytes 14.2 Mbits/sec 0 sender
[ 7] 0.00-10.00 sec 17.2 MBytes 14.5 Mbits/sec receiver
[ 9] 0.00-10.66 sec 21.6 MBytes 17.0 Mbits/sec 0 sender
[ 9] 0.00-10.00 sec 20.9 MBytes 17.5 Mbits/sec receiver
[ 11] 0.00-10.66 sec 20.1 MBytes 15.8 Mbits/sec 0 sender
[ 11] 0.00-10.00 sec 19.4 MBytes 16.3 Mbits/sec receiver
[ 13] 0.00-10.66 sec 20.1 MBytes 15.8 Mbits/sec 0 sender
[ 13] 0.00-10.00 sec 19.4 MBytes 16.3 Mbits/sec receiver
[ 15] 0.00-10.66 sec 22.9 MBytes 18.0 Mbits/sec 0 sender
[ 15] 0.00-10.00 sec 22.1 MBytes 18.6 Mbits/sec receiver
[ 17] 0.00-10.66 sec 19.4 MBytes 15.2 Mbits/sec 0 sender
[ 17] 0.00-10.00 sec 18.6 MBytes 15.6 Mbits/sec receiver
[ 19] 0.00-10.66 sec 20.0 MBytes 15.7 Mbits/sec 0 sender
[ 19] 0.00-10.00 sec 19.1 MBytes 16.1 Mbits/sec receiver
[ 21] 0.00-10.66 sec 22.8 MBytes 17.9 Mbits/sec 0 sender
[ 21] 0.00-10.00 sec 21.9 MBytes 18.3 Mbits/sec receiver
[ 23] 0.00-10.66 sec 20.8 MBytes 16.3 Mbits/sec 0 sender
[ 23] 0.00-10.00 sec 19.9 MBytes 16.7 Mbits/sec receiver
[ 25] 0.00-10.66 sec 20.0 MBytes 15.7 Mbits/sec 0 sender
[ 25] 0.00-10.00 sec 19.1 MBytes 16.0 Mbits/sec receiver
[ 27] 0.00-10.66 sec 18.5 MBytes 14.6 Mbits/sec 0 sender
[ 27] 0.00-10.00 sec 17.6 MBytes 14.8 Mbits/sec receiver
[ 29] 0.00-10.66 sec 18.8 MBytes 14.8 Mbits/sec 0 sender
[ 29] 0.00-10.00 sec 17.9 MBytes 15.0 Mbits/sec receiver
[ 31] 0.00-10.66 sec 16.6 MBytes 13.1 Mbits/sec 0 sender
[ 31] 0.00-10.00 sec 15.8 MBytes 13.2 Mbits/sec receiver
[ 33] 0.00-10.66 sec 17.0 MBytes 13.4 Mbits/sec 0 sender
[ 33] 0.00-10.00 sec 16.1 MBytes 13.5 Mbits/sec receiver
[ 35] 0.00-10.66 sec 17.6 MBytes 13.9 Mbits/sec 0 sender
[ 35] 0.00-10.00 sec 16.8 MBytes 14.1 Mbits/sec receiver
[ 37] 0.00-10.66 sec 18.9 MBytes 14.9 Mbits/sec 0 sender
[ 37] 0.00-10.00 sec 18.0 MBytes 15.1 Mbits/sec receiver
[ 39] 0.00-10.66 sec 17.8 MBytes 14.0 Mbits/sec 0 sender
[ 39] 0.00-10.00 sec 16.9 MBytes 14.2 Mbits/sec receiver
[ 41] 0.00-10.66 sec 20.0 MBytes 15.7 Mbits/sec 0 sender
[ 41] 0.00-10.00 sec 19.1 MBytes 16.0 Mbits/sec receiver
[ 43] 0.00-10.66 sec 21.9 MBytes 17.2 Mbits/sec 0 sender
[ 43] 0.00-10.00 sec 21.0 MBytes 17.6 Mbits/sec receiver
[ 45] 0.00-10.66 sec 20.8 MBytes 16.3 Mbits/sec 0 sender
[ 45] 0.00-10.00 sec 19.9 MBytes 16.7 Mbits/sec receiver
[ 47] 0.00-10.66 sec 16.2 MBytes 12.8 Mbits/sec 0 sender
[ 47] 0.00-10.00 sec 15.4 MBytes 12.9 Mbits/sec receiver
[ 49] 0.00-10.66 sec 19.0 MBytes 15.0 Mbits/sec 0 sender
[ 49] 0.00-10.00 sec 18.1 MBytes 15.2 Mbits/sec receiver
[ 51] 0.00-10.66 sec 21.5 MBytes 16.9 Mbits/sec 0 sender
[ 51] 0.00-10.00 sec 20.6 MBytes 17.3 Mbits/sec receiver
[ 53] 0.00-10.66 sec 16.8 MBytes 13.2 Mbits/sec 0 sender
[ 53] 0.00-10.00 sec 15.9 MBytes 13.3 Mbits/sec receiver
[ 55] 0.00-10.66 sec 15.6 MBytes 12.3 Mbits/sec 0 sender
[ 55] 0.00-10.00 sec 14.8 MBytes 12.4 Mbits/sec receiver
[ 57] 0.00-10.66 sec 17.6 MBytes 13.9 Mbits/sec 0 sender
[ 57] 0.00-10.00 sec 16.8 MBytes 14.1 Mbits/sec receiver
[ 59] 0.00-10.66 sec 16.1 MBytes 12.7 Mbits/sec 0 sender
[ 59] 0.00-10.00 sec 15.2 MBytes 12.8 Mbits/sec receiver
[ 61] 0.00-10.66 sec 15.0 MBytes 11.8 Mbits/sec 1 sender
[ 61] 0.00-10.00 sec 14.1 MBytes 11.8 Mbits/sec receiver
[ 63] 0.00-10.66 sec 13.5 MBytes 10.6 Mbits/sec 0 sender
[ 63] 0.00-10.00 sec 12.6 MBytes 10.6 Mbits/sec receiver
[SUM] 0.00-10.66 sec 564 MBytes 444 Mbits/sec 1 sender
[SUM] 0.00-10.00 sec 539 MBytes 452 Mbits/sec receiver
iperf Done.
--------@DiskStation:/$ iperf3 -c 192.168.1.1 -p 3958 -P 30 -4
Connecting to host 192.168.1.1, port 3958
[ 5] local 192.168.1.10 port 50816 connected to 192.168.1.1 port 3958
[ 7] local 192.168.1.10 port 50818 connected to 192.168.1.1 port 3958
[ 9] local 192.168.1.10 port 50820 connected to 192.168.1.1 port 3958
[ 11] local 192.168.1.10 port 50822 connected to 192.168.1.1 port 3958
[ 13] local 192.168.1.10 port 50824 connected to 192.168.1.1 port 3958
[ 15] local 192.168.1.10 port 50826 connected to 192.168.1.1 port 3958
[ 17] local 192.168.1.10 port 50828 connected to 192.168.1.1 port 3958
[ 19] local 192.168.1.10 port 50830 connected to 192.168.1.1 port 3958
[ 21] local 192.168.1.10 port 50836 connected to 192.168.1.1 port 3958
[ 23] local 192.168.1.10 port 50838 connected to 192.168.1.1 port 3958
[ 25] local 192.168.1.10 port 50840 connected to 192.168.1.1 port 3958
[ 27] local 192.168.1.10 port 50842 connected to 192.168.1.1 port 3958
[ 29] local 192.168.1.10 port 50844 connected to 192.168.1.1 port 3958
[ 31] local 192.168.1.10 port 50846 connected to 192.168.1.1 port 3958
[ 33] local 192.168.1.10 port 50848 connected to 192.168.1.1 port 3958
[ 35] local 192.168.1.10 port 50850 connected to 192.168.1.1 port 3958
[ 37] local 192.168.1.10 port 50852 connected to 192.168.1.1 port 3958
[ 39] local 192.168.1.10 port 50854 connected to 192.168.1.1 port 3958
[ 41] local 192.168.1.10 port 50856 connected to 192.168.1.1 port 3958
[ 43] local 192.168.1.10 port 50858 connected to 192.168.1.1 port 3958
[ 45] local 192.168.1.10 port 50860 connected to 192.168.1.1 port 3958
[ 47] local 192.168.1.10 port 50862 connected to 192.168.1.1 port 3958
[ 49] local 192.168.1.10 port 50864 connected to 192.168.1.1 port 3958
[ 51] local 192.168.1.10 port 50866 connected to 192.168.1.1 port 3958
[ 53] local 192.168.1.10 port 50868 connected to 192.168.1.1 port 3958
[ 55] local 192.168.1.10 port 50870 connected to 192.168.1.1 port 3958
[ 57] local 192.168.1.10 port 50872 connected to 192.168.1.1 port 3958
[ 59] local 192.168.1.10 port 50874 connected to 192.168.1.1 port 3958
[ 61] local 192.168.1.10 port 50876 connected to 192.168.1.1 port 3958
[ 63] local 192.168.1.10 port 50878 connected to 192.168.1.1 port 3958
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 15.3 MBytes 12.8 Mbits/sec 0 sender
[ 5] 0.00-10.11 sec 15.2 MBytes 12.6 Mbits/sec receiver
[ 7] 0.00-10.00 sec 15.4 MBytes 12.9 Mbits/sec 0 sender
[ 7] 0.00-10.11 sec 15.3 MBytes 12.7 Mbits/sec receiver
[ 9] 0.00-10.00 sec 22.7 MBytes 19.0 Mbits/sec 0 sender
[ 9] 0.00-10.11 sec 22.5 MBytes 18.6 Mbits/sec receiver
[ 11] 0.00-10.00 sec 15.2 MBytes 12.7 Mbits/sec 0 sender
[ 11] 0.00-10.11 sec 15.1 MBytes 12.5 Mbits/sec receiver
[ 13] 0.00-10.00 sec 15.4 MBytes 12.9 Mbits/sec 0 sender
[ 13] 0.00-10.11 sec 15.3 MBytes 12.7 Mbits/sec receiver
[ 15] 0.00-10.00 sec 15.3 MBytes 12.8 Mbits/sec 0 sender
[ 15] 0.00-10.11 sec 15.2 MBytes 12.6 Mbits/sec receiver
[ 17] 0.00-10.00 sec 15.5 MBytes 13.0 Mbits/sec 0 sender
[ 17] 0.00-10.11 sec 15.4 MBytes 12.8 Mbits/sec receiver
[ 19] 0.00-10.00 sec 15.7 MBytes 13.2 Mbits/sec 0 sender
[ 19] 0.00-10.11 sec 15.6 MBytes 13.0 Mbits/sec receiver
[ 21] 0.00-10.00 sec 15.5 MBytes 13.0 Mbits/sec 0 sender
[ 21] 0.00-10.11 sec 15.4 MBytes 12.8 Mbits/sec receiver
[ 23] 0.00-10.00 sec 15.0 MBytes 12.6 Mbits/sec 1 sender
[ 23] 0.00-10.11 sec 14.9 MBytes 12.4 Mbits/sec receiver
[ 25] 0.00-10.00 sec 15.4 MBytes 12.9 Mbits/sec 0 sender
[ 25] 0.00-10.11 sec 15.3 MBytes 12.7 Mbits/sec receiver
[ 27] 0.00-10.00 sec 15.2 MBytes 12.7 Mbits/sec 0 sender
[ 27] 0.00-10.11 sec 15.1 MBytes 12.5 Mbits/sec receiver
[ 29] 0.00-10.00 sec 15.4 MBytes 12.9 Mbits/sec 0 sender
[ 29] 0.00-10.11 sec 15.3 MBytes 12.7 Mbits/sec receiver
[ 31] 0.00-10.00 sec 22.7 MBytes 19.0 Mbits/sec 1 sender
[ 31] 0.00-10.11 sec 22.6 MBytes 18.7 Mbits/sec receiver
[ 33] 0.00-10.00 sec 34.2 MBytes 28.7 Mbits/sec 0 sender
[ 33] 0.00-10.11 sec 33.9 MBytes 28.1 Mbits/sec receiver
[ 35] 0.00-10.00 sec 15.2 MBytes 12.8 Mbits/sec 0 sender
[ 35] 0.00-10.11 sec 15.2 MBytes 12.6 Mbits/sec receiver
[ 37] 0.00-10.00 sec 23.2 MBytes 19.4 Mbits/sec 0 sender
[ 37] 0.00-10.11 sec 23.0 MBytes 19.1 Mbits/sec receiver
[ 39] 0.00-10.00 sec 15.3 MBytes 12.8 Mbits/sec 0 sender
[ 39] 0.00-10.11 sec 15.1 MBytes 12.6 Mbits/sec receiver
[ 41] 0.00-10.00 sec 15.5 MBytes 13.0 Mbits/sec 0 sender
[ 41] 0.00-10.11 sec 15.4 MBytes 12.8 Mbits/sec receiver
[ 43] 0.00-10.00 sec 17.3 MBytes 14.5 Mbits/sec 0 sender
[ 43] 0.00-10.11 sec 17.0 MBytes 14.1 Mbits/sec receiver
[ 45] 0.00-10.00 sec 15.1 MBytes 12.7 Mbits/sec 0 sender
[ 45] 0.00-10.11 sec 15.0 MBytes 12.5 Mbits/sec receiver
[ 47] 0.00-10.00 sec 15.2 MBytes 12.8 Mbits/sec 0 sender
[ 47] 0.00-10.11 sec 15.1 MBytes 12.6 Mbits/sec receiver
[ 49] 0.00-10.00 sec 15.3 MBytes 12.8 Mbits/sec 0 sender
[ 49] 0.00-10.11 sec 15.1 MBytes 12.6 Mbits/sec receiver
[ 51] 0.00-10.00 sec 15.1 MBytes 12.7 Mbits/sec 0 sender
[ 51] 0.00-10.11 sec 15.0 MBytes 12.5 Mbits/sec receiver
[ 53] 0.00-10.00 sec 15.4 MBytes 12.9 Mbits/sec 0 sender
[ 53] 0.00-10.11 sec 15.3 MBytes 12.7 Mbits/sec receiver
[ 55] 0.00-10.00 sec 15.4 MBytes 12.9 Mbits/sec 0 sender
[ 55] 0.00-10.11 sec 15.3 MBytes 12.7 Mbits/sec receiver
[ 57] 0.00-10.00 sec 15.1 MBytes 12.7 Mbits/sec 0 sender
[ 57] 0.00-10.11 sec 15.0 MBytes 12.5 Mbits/sec receiver
[ 59] 0.00-10.00 sec 23.0 MBytes 19.3 Mbits/sec 0 sender
[ 59] 0.00-10.11 sec 22.9 MBytes 19.0 Mbits/sec receiver
[ 61] 0.00-10.00 sec 22.2 MBytes 18.6 Mbits/sec 0 sender
[ 61] 0.00-10.11 sec 21.8 MBytes 18.1 Mbits/sec receiver
[ 63] 0.00-10.00 sec 22.5 MBytes 18.9 Mbits/sec 0 sender
[ 63] 0.00-10.11 sec 22.4 MBytes 18.6 Mbits/sec receiver
[SUM] 0.00-10.00 sec 525 MBytes 440 Mbits/sec 2 sender
[SUM] 0.00-10.11 sec 521 MBytes 432 Mbits/sec receiver
iperf Done.
Hey all,
I wanted to resurrect this thread. I'm a new convert to opnsense and I'm really impressed. Though I am having issues with single connection performance on my apu2e4 (i210 nic).
I know this horse has been beaten to death, the problem I'm having is a lot of the tunables that get posted around are not up to date, and I haven't seen anything for newer BSD versions. I have no problem getting gigabit with multiple streams in iperf, but singe stream tops out around 400mbps. I know the hardware is capable of it since it worked fine on linux (again I know, dead horse)
I'm more interested in understanding the technical reason behind the limitation, if there are tunables/settings in opnsense 22 that can improve the performance, and is there an upstream bug/effort to improve this.
Does anyone know if the upcoming FreeBSD 13.1 (coming in OPNsense 22.7) brings some single core network improvements?
OPNsense 22.1 is already running on FreeBSD 13-STABLE. That's closer to 13.1 than to 13.0. There won't be huge changes AFAIK.
See the FreeBSD release model for reference. -STABLE is a moving target that gets continuously updated independent of tagged release versions. The picture is slightly outdated, since 12.3 and 13.1 are the current release versions at the moment.
(https://forums.freebsd.org/attachments/freebsd_versions-png.9652/)
It's easy to test 13.1 now, but no high hopes as Patrick explained... https://forum.opnsense.org/index.php?topic=28505.0
Cheers,
Franco