Puzzling assymmetric network speeds

Started by holunde, May 09, 2021, 04:07:56 PM

Previous topic - Next topic
May 09, 2021, 04:07:56 PM Last Edit: May 15, 2021, 09:59:38 AM by holunde
I have OPNSense installed on a Netgate SG-4860 unit.
There is nothing special about the installation, except that I have the wireguard kernel-module installed.
There are no problems with stability or anything like that and the unit can route at my full internet-speed, which is 1Gbit/s, so in quiet times I see up and download speeds using fx speedtest.net at bit over 900 mbits/s in both directions meassured using my home computer.
But I experienced assymmetric speeds connecting to my workplace using using a site to site Wireguard-connection.
Not that it is a problem because the speed is fine, but it puzzles me, what the cause is. Here are the wireguard speeds meassured against between my home-pc and my workstation at work(!) using iperf3. Both pc's run Linux Mint.

First from the home-pc(192.168.254.6) towards the work-pc(10.0.5.1)

ho@hohome:~$ iperf3 -c 10.0.5.1
Connecting to host 10.0.5.1, port 5201
[  5] local 192.168.254.6 port 59292 connected to 10.0.5.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  65.5 MBytes   549 Mbits/sec   82   1.61 MBytes       
[  5]   1.00-2.00   sec  68.8 MBytes   577 Mbits/sec    0   1.76 MBytes       
[  5]   2.00-3.00   sec  67.5 MBytes   566 Mbits/sec    0   1.89 MBytes       
[  5]   3.00-4.00   sec  68.8 MBytes   577 Mbits/sec    0   1.99 MBytes       
[  5]   4.00-5.00   sec  67.5 MBytes   566 Mbits/sec    0   2.06 MBytes       

Now from the work-pc towards the home-pc

ho@hohome:~$ iperf3 -c 10.0.5.1 -R
Connecting to host 10.0.5.1, port 5201
Reverse mode, remote host 10.0.5.1 is sending
[  5] local 192.168.254.6 port 59302 connected to 10.0.5.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  79.2 MBytes   664 Mbits/sec                 
[  5]   1.00-2.00   sec  87.7 MBytes   735 Mbits/sec                 
[  5]   2.00-3.00   sec  88.6 MBytes   744 Mbits/sec                 
[  5]   3.00-4.00   sec  91.0 MBytes   764 Mbits/sec                 
[  5]   4.00-5.00   sec  79.2 MBytes   664 Mbits/sec                 

In the second mesurement the speed varies quite a lot, but it is definitively faster
So I got curious and mesured the speed from the home-pc directly against the lan-port on the OPNSense-router, the SG-4860, using iperf3 from the console on the router(192.168.254.1)

First from the home-pc towards the OPNSense-lan-port

ho@hohome:~$ iperf3 -c 192.168.254.1
Connecting to host 192.168.254.1, port 5201
[  5] local 192.168.254.6 port 42940 connected to 192.168.254.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  81.0 MBytes   679 Mbits/sec    0    484 KBytes       
[  5]   1.00-2.00   sec  77.6 MBytes   651 Mbits/sec    0    484 KBytes       
[  5]   2.00-3.00   sec  78.7 MBytes   660 Mbits/sec    0    484 KBytes       
[  5]   3.00-4.00   sec  79.6 MBytes   668 Mbits/sec    0    484 KBytes       
[  5]   4.00-5.00   sec  78.5 MBytes   658 Mbits/sec    0    484 KBytes       

And now the other way

ho@hohome:~$ iperf3 -c 192.168.254.1 -R
Connecting to host 192.168.254.1, port 5201
Reverse mode, remote host 192.168.254.1 is sending
[  5] local 192.168.254.6 port 42944 connected to 192.168.254.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   112 MBytes   940 Mbits/sec                 
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec                 
[  5]   2.00-3.00   sec   112 MBytes   942 Mbits/sec                 
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec                 
[  5]   4.00-5.00   sec   112 MBytes   942 Mbits/sec                 

I've repeated these tests and the behaviour seems absolutely consistent.
The pattern seems to be that traffic going IN to the lan-port of the router is slower than in the other direction.
I'm new to OPNsense, having beeen a pfSense user for years until the latest Wireguard-scandal, so there might be
something that I'm not aware of in the configuration of the OPNSense-system.
But then again, it is odd that there seems to be no assymmetric behaviour when testing with speedtest.net from the home-pc, at least not anything significant.

Does anyone have an idea of what is going on here?

Generally hosting iperf on the router itself doesn't give the best results as the router is designed to transfer packets between interfaces. When the iperf tests are running, are you seeing any bottlenecks within top? Run 'top -aSCHIP' while also running an extended iperf3 test (5 minutes or so) and see if you can notice anything getting stuck on a single core.

Another thing I would check would be flow control and Energy Efficient Ethernet (EEE). These are tunable setting for Intel NICs. You can set these tunables to disable FC and EEE for igb chipsets. Obviously if you have more than 4 interfaces, just continue to add to the interface numbers to set this on additional interfaces.
dev.igb.0.fc 0
dev.igb.1.fc 0
dev.igb.2.fc 0
dev.igb.3.fc 0
dev.igb.0.eee_disabled 1
dev.igb.1.eee_disabled 1
dev.igb.2.eee_disabled 1
dev.igb.3.eee_disabled 1


I'm saying this last because I am sure you already checked this. But I'm assuming this bottleneck couldn't be related to the work ISP connection having a slower upload speed due to bandwidth sharing with many devices in the workplace?

Hi opnfwb

Thanks for your answer
Yes, I have already experimented with some of the low-level settings and right now I have

dev.igb.X.fc=0
dev.igb.X.eee_disabled=1

set for all 6 ports on the device.
But in my experiments I have never seen any of them have any effect.
And this also goes for some other more powerfull routers I run OPNSense on. This model actually, https://teklager.se/en/products/routers/tlsense-i7-7500U.
But what DOES absolutely have an immediate effect on network performance on these Intel-based routers is to enable PowerD and set the power-mode to maximum in System->Settings->Miscellaneous->Power savings.
Without this the SG-4860 will not route at gigabit-speed.
There are settings like this that is being discussed from time to time, but much of it seems like guess-work. It would be nice with some authoritative guide on this for Intel chip-sets for example.
I've done a lot of tests and there is no doubt. For some reason this device has a slow-down when it comes to the situations, I have described.

You make a great point with PowerD. I've seen a few posts here suggest that the "Hi-Adaptive" setting is the most preferred. However, I think that was in the context of the smaller and less powerful PC Engines APU series.

The fact that the unit can route at 1Gbit without the VPN seems to indicate some kind of bottleneck but without loading it up and checking the usage it will be hard to guess what else it could be. It may even be some kind of MTU mismatch that's causing one end to suffer when pushing packets through the VPN tunnel?