PC Engines APU2 1Gbit traffic not achievable

Started by Ricardo, July 27, 2018, 12:24:54 PM

Previous topic - Next topic
local = German? I can ask my boss is the company is willing to test such a device ..

Quote from: mimugmail on September 04, 2018, 01:51:02 PM
local = German? I can ask my boss is the company is willing to test such a device ..

I am not from Germany, I live in eastern Europe, just converted my local currency to EUR for an approximate estimation. But your local PC shop may sell these devices even cheaper:

http://pcengines.ch/order.htm

Quote from: mimugmail on September 04, 2018, 01:51:02 PM
local = German? I can ask my boss is the company is willing to test such a device ..


@mimugmail: I have a spare APU2 I no longer use, if you send me your bank account details, pass-codes etc.. that will do as security.


PM me we'll work something out. : :)
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member



Cool... OK.
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

Quote from: mimugmail on September 04, 2018, 03:11:54 PM
I ordered this via company, no tax, so only 160EUR

https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M

I really meant to support this evaluation effort. So if there is still something needed, let us know!

Quote from: ricsip on August 15, 2018, 11:55:19 AM
Quote from: mimugmail on August 09, 2018, 04:16:00 PM
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1"  # (default 100 packets to process concurrently)

I have also went through this. No measurable improvement in throughput.

machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and  sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.

hw.igb.num_queues="2"  # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.

hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1"  --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did

I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.

Some small addendum:
recently I noticed (maybe when upgraded to 18.7.1_3, but TBH not sure), that sometimes (depends on the actual throughput / interrupt load shared among cores) the serial-console hangs during iperf. As soon as the iperf session is finished or I interrupt the session manually, serial-console becomes live again. Noticed during running "top" on console: I noticed refresh stopped / frozen during the iperf session, keyboard wasnt working either while the iperf traffic happened. As soon the iperf session finished, "top" continued to produce output / console responds to keystrokes.

Seems has to do something with the fact when throughput is alternating between those 2-3 discrete levels randomly among iperf sessions.


No, never!

The 2 iperf endpoints are running on a PC connected to LAN (igb1) and another PC connected to WAN (igb0), the APU is always just a transit device (packet forwarding / packet filtering  / NAT translation between igb1 and igb0 and vice versa), never terminating any iperf traffic directly on it.

Next week I should get my device and will put it in my lab. Lets see ..

Yet another small addendum:

finally I managed to test throughput over pppoe, under real life conditions.

Results are quite weak:
approx. 250-270 Mbit/sec (WAN-->LAN traffic direction) was achieved with the APU2. Not iperf this time, but tested with some torrent (so nobody can tell that I was pushing for unrealistic expectations over 1 single flow).
Again, the router was only a transit device, the torrent client was running on a PC behind the APU. SSD wasnt the bottleneck during download.

As a comparison, using a different vendor router, I was able to achieve 580-600 Mbit/sec easily downloading the same test torrent. Didnt investigate if it could go higher or not with this different vendor router, but thats still more than double performance difference.


Quote from: mimugmail on September 07, 2018, 05:59:15 PM
You mean IPFire on the same hardware?

No, not ipfire. Sorry if I was unclear :)

I installed a competely different equipment ( Asus AC66U B1 router) just for comparison to see if that router can reach the wirespeed gigabit.

On the APU I could not test ipfire today due to not enough time, but maybe in the coming days I will do another round of tests using the ipfire.

Need to find a timeslot when no users are using the internet :(

If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.