PC Engines APU2 1Gbit traffic not achievable

mimugmail · September 04, 2018, 01:51:02 PM

local = German? I can ask my boss is the company is willing to test such a device ..

Ricardo · September 04, 2018, 02:33:13 PM

Quote from: mimugmail on September 04, 2018, 01:51:02 PM
local = German? I can ask my boss is the company is willing to test such a device ..

I am not from Germany, I live in eastern Europe, just converted my local currency to EUR for an approximate estimation. But your local PC shop may sell these devices even cheaper:

http://pcengines.ch/order.htm

marjohn56 · September 04, 2018, 03:02:04 PM

Quote from: mimugmail on September 04, 2018, 01:51:02 PM
local = German? I can ask my boss is the company is willing to test such a device ..

@mimugmail: I have a spare APU2 I no longer use, if you send me your bank account details, pass-codes etc.. that will do as security.

PM me we'll work something out. : :)

mimugmail · September 04, 2018, 03:09:47 PM

You want to send it to me AND want my bank details?? :P

mimugmail · September 04, 2018, 03:11:54 PM

I ordered this via company, no tax, so only 160EUR

https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M

marjohn56 · September 04, 2018, 03:30:56 PM

Cool... OK.

Ricardo · September 04, 2018, 04:02:28 PM

Quote from: mimugmail on September 04, 2018, 03:11:54 PM
I ordered this via company, no tax, so only 160EUR

https://www.amazon.de/PC-Engines-APU-2C4-Netzteil-schwarzes/dp/B01GEIEI7M

I really meant to support this evaluation effort. So if there is still something needed, let us know!

Ricardo · September 07, 2018, 12:44:24 PM

Quote from: ricsip on August 15, 2018, 11:55:19 AM
Quote from: mimugmail on August 09, 2018, 04:16:00 PM
https://calomel.org/freebsd_network_tuning.html

# Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
# multithreading (SMT) because implementations typically share TLBs and L1
# caches between threads which is a security concern. SMT is likely to slow
# down workloads not specifically optimized for SMT if you have a CPU with more
# than two(2) real CPU cores. Secondly, multi-queue network cards are as much
# as 20% slower when network queues are bound to real CPU cores and well as SMT
# virtual cores due to interrupt processing inefficiencies.
machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT))

# Intel igb(4): The Intel i350-T2 dual port NIC supports up to eight(8)
# input/output queues per network port, the card has two(2) network ports.
#
# Multiple transmit and receive queues in network hardware allow network
# traffic streams to be distributed into queues. Queues can be mapped by the
# FreeBSD network card driver to specific processor cores leading to reduced
# CPU cache misses. Queues also distribute the workload over multiple CPU
# cores, process network traffic in parallel and prevent network traffic or
# interrupt processing from overwhelming a single CPU core.
#
# http://www.intel.com/content/dam/doc/white-paper/improving-network-performance-in-multi-core-systems-paper.pdf
#
# For a firewall under heavy CPU load we recommend setting the number of
# network queues equal to the total number of real CPU cores in the machine
# divided by the number of active network ports. For example, a firewall with
# four(4) real CPU cores and an i350-T2 dual port NIC should use two(2) queues
# per network port (hw.igb.num_queues=2). This equals a total of four(4)
# network queues over two(2) network ports which map to to four(4) real CPU
# cores. A FreeBSD server with four(4) real CPU cores and a single network port
# should use four(4) network queues (hw.igb.num_queues=4). Or, set
# hw.igb.num_queues to zero(0) to allow the FreeBSD driver to automatically set
# the number of network queues to the number of CPU cores. It is not recommend
# to allow more network queues than real CPU cores per network port.
#
# Query total interrupts per queue with "vmstat -i" and use "top -CHIPS" to
# watch CPU usage per igb0:que. Multiple network queues will trigger more total
# interrupts compared to a single network queue, but the processing of each of
# those queues will be spread over multiple CPU cores allowing the system to
# handle increased network traffic loads.
hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)

# Intel igb(4): FreeBSD puts an upper limit on the the number of received
# packets a network card can process to 100 packets per interrupt cycle. This
# limit is in place because of inefficiencies in IRQ sharing when the network
# card is using the same IRQ as another device. When the Intel network card is
# assigned a unique IRQ (dmesg) and MSI-X is enabled through the driver
# (hw.igb.enable_msix=1) then interrupt scheduling is significantly more
# efficient and the NIC can be allowed to process packets as fast as they are
# received. A value of "-1" means unlimited packet processing and sets the same
# value to dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit . A
# process limit of "-1" is around one(1%) percent faster than "100" on a
# saturated network connection.
hw.igb.rx_process_limit="-1" # (default 100 packets to process concurrently)

I have also went through this. No measurable improvement in throughput.

machdep.hyperthreading_allowed="0" # (default 1, allow Hyper Threading (HT)) --> NOT APPLICABLE to my case. This AMD CPU has 4 physical cores, and sysctl hw.ncpu --> 4, so HT (even if supported, I am not sure) is not active currently.

hw.igb.num_queues="2" # (default 0 , queues equal the number of CPU real cores)
--> I have 4 cores, 2 active NIC, each NIC supports up to 4 queues. I used by default
hw.igb.num_queues="0", but tried it with hw.igb.num_queues="2" as well.
No improvement in throughput (for single-flow).
But! It seems degraded the multi-flow performance heavily.

hw.igb.enable_msix=1 was like that since the beginning
hw.igb.rx_process_limit="-1" --> was set, but no real improvement in throughput
dev.igb.0.rx_processing_limit and dev.igb.1.rx_processing_limit is both set to "-1" as per previous entry did

I am very sad that this wont be solveable under Opnsense without switching to competitors or switching the hardware itself.

Some small addendum:
recently I noticed (maybe when upgraded to 18.7.1_3, but TBH not sure), that sometimes (depends on the actual throughput / interrupt load shared among cores) the serial-console hangs during iperf. As soon as the iperf session is finished or I interrupt the session manually, serial-console becomes live again. Noticed during running "top" on console: I noticed refresh stopped / frozen during the iperf session, keyboard wasnt working either while the iperf traffic happened. As soon the iperf session finished, "top" continued to produce output / console responds to keystrokes.

Seems has to do something with the fact when throughput is alternating between those 2-3 discrete levels randomly among iperf sessions.

mimugmail · September 07, 2018, 12:49:14 PM

Do you run iperf on the Firewall itself?

Ricardo · September 07, 2018, 12:59:54 PM

No, never!

The 2 iperf endpoints are running on a PC connected to LAN (igb1) and another PC connected to WAN (igb0), the APU is always just a transit device (packet forwarding / packet filtering / NAT translation between igb1 and igb0 and vice versa), never terminating any iperf traffic directly on it.

mimugmail · September 07, 2018, 02:40:33 PM

Next week I should get my device and will put it in my lab. Lets see ..

Ricardo · September 07, 2018, 05:56:51 PM

Yet another small addendum:

finally I managed to test throughput over pppoe, under real life conditions.

Results are quite weak:
approx. 250-270 Mbit/sec (WAN-->LAN traffic direction) was achieved with the APU2. Not iperf this time, but tested with some torrent (so nobody can tell that I was pushing for unrealistic expectations over 1 single flow).
Again, the router was only a transit device, the torrent client was running on a PC behind the APU. SSD wasnt the bottleneck during download.

As a comparison, using a different vendor router, I was able to achieve 580-600 Mbit/sec easily downloading the same test torrent. Didnt investigate if it could go higher or not with this different vendor router, but thats still more than double performance difference.

mimugmail · September 07, 2018, 05:59:15 PM

You mean IPFire on the same hardware?

Ricardo · September 07, 2018, 06:11:21 PM

Quote from: mimugmail on September 07, 2018, 05:59:15 PM
You mean IPFire on the same hardware?

No, not ipfire. Sorry if I was unclear :)

I installed a competely different equipment ( Asus AC66U B1 router) just for comparison to see if that router can reach the wirespeed gigabit.

On the APU I could not test ipfire today due to not enough time, but maybe in the coming days I will do another round of tests using the ipfire.

Need to find a timeslot when no users are using the internet :(

mimugmail · September 07, 2018, 06:20:39 PM

If I remember correctly you said this on the FreeBSD Net List regarding OPN and IPFire. I'll check next week.

PC Engines APU2 1Gbit traffic not achievable

mimugmail

September 04, 2018, 01:51:02 PM #30

Ricardo

September 04, 2018, 02:33:13 PM #31

marjohn56

September 04, 2018, 03:02:04 PM #32

mimugmail

September 04, 2018, 03:09:47 PM #33

mimugmail

September 04, 2018, 03:11:54 PM #34

marjohn56

September 04, 2018, 03:30:56 PM #35

Ricardo

September 04, 2018, 04:02:28 PM #36

Ricardo

September 07, 2018, 12:44:24 PM #37

mimugmail

September 07, 2018, 12:49:14 PM #38

Ricardo

September 07, 2018, 12:59:54 PM #39

mimugmail

September 07, 2018, 02:40:33 PM #40

Ricardo

September 07, 2018, 05:56:51 PM #41

mimugmail

September 07, 2018, 05:59:15 PM #42

Ricardo

September 07, 2018, 06:11:21 PM #43

mimugmail

September 07, 2018, 06:20:39 PM #44