Wireguard Speed Issue ** SOLVED **

Started by dirtyfreebooter, February 10, 2025, 06:50:36 AM

Previous topic - Next topic
February 15, 2025, 08:30:55 PM #15 Last Edit: February 15, 2025, 08:39:39 PM by dirtyfreebooter
it just appears this is too much information for ppl to read all the text and see that i have tried multiple combinations of hardware, software, nic, cables, etc.... that is a bit frustrating, but understandable, i guess.

i have been swapping multi PCs for both client, server, and router.

AMD 7950X3D
Intel Xeon E-2278G (X11SCL-iF)
Intel Xeon E-2414 (X13SCH-LN4F)
Intel Pentium G7400 (X13SCL-iF)
Odroid H4 Ultra Intel n-305
Lenovo P3 Tiny Intel i3-14100t
Intel i7-13700t (X13SAE-F)

with the exception of the 7950X3D, i have just swapped the SSDs around between client (Ubuntu 22.04), router (OPNsense 24.10.2 and 25.1.1, pfSense 2.7.2 CE), server (FreeBSD 14.2, Ubuntu 22.04)

i have also tried the following NICs in all combinations of PCs and NICs:
Intel E810-XVVA2 (25g)
Mellanox Connect5 (25g)
Mellanox Connect3 (10g)
Intel X710-DA2 fw 9.53
Intel X710-DA2 fw 8.10
Intel X520-DA2
Intel i225V-B3
Intel i226V
Intel i350-T4
Intel i210 (onboard NICs for supermicro motherboards)

and i have used 50 different patch cables, SPF+ transceivers, UniFi DAC cables, etc.

the only common issue is OPNsense that i have discovered so far. running through all those combinations took days and days. both pfSense and OPNsense act the same way on every combination. pfSense is line rate or CPU bound. OPNsense is line rate or CPU bound in one direction and then kernel bound to ~500 mbit/sec in one direction when routing through Wireguard or OpenVPN. OPNsense can NAT port forward at 10g easily.

No matter the NIC speed or driver in OPNsense, routing through OpenVPN or Wireguard results in ~ 500 mbit/sec throughput in one direction

I saw evidence of swapping the router and HW on the router, less so of swapping client & server or where the test is initiated from.

I question your testing methodology. Going wide might lead to a combination that works, not necessarily for a root cause of the mismatch.
OPN is a common factor but it would be easier to blame if that was the case in both directions.

if i disable the firewall
pfctl -dbut still use the wireguard interface, i can see the wireguard kernel threads using CPU and i get symmetrical speeds with intel i350-t4, i226v, x710-da2. i didn't test anymore, since the before/after was 100% reproducible.

there is 100% some bug in the OPNsense firewall / pf side of things or some configuration that comes with a vanilla install that is causing this.

i am blaming OPN because the exact setups all work when i try pfSense 2.7.2 CE, OpenWRT x86_64 24.10.0. the same client/servers, hardware, just replace the router software.

You might be getting somewhere.
So FW + Wireguard + iperf3 reverse?

And the steps are:
Client -> OPN-WAN / OPN-LAN -> Server
Wireguard connection to OPN server initiated from Client
Then from client: iperf3 --client <server ip> --no-delay --parallel 8 --reverse
Rule on the Wireguard interface?

I'm actually a little curious about what the actual traffic looks like so I'll set something up after I get confirmation of the entire test environment.
I have no idea what FW state is going to be created as a result of such experiment so this is a learning opportunity.

February 16, 2025, 10:23:46 PM #19 Last Edit: February 16, 2025, 10:43:44 PM by dirtyfreebooter
my basic setup looks like this:



3 computers are
  • completely isolated, directly connected
  • in my 10g setup, SFP+ OM3 fiber or UniFi DAC cables, results are the same
  • fresh vanilla installs for ubuntu 24.04, pfSense 2.7.2 CE, OPNsense 24.10.2, 25.1.1
  • router software is changed by just swapping the SSD, all other hardware stays exactly the same

wireguard is setups using the official pfSense documentation and OPNsense road warrior documentation.

iperf3 commands on the client:
iperf3 --client 192.168.1.100 --no-delay --omit 1 --time 5 --parallel 4 --format m
iperf3 --client 192.168.1.100 --no-delay --omit 1 --time 5 --parallel 4 --format m --reverse

to verify the setup, i setup a NAT port forward on port 5201 to 192.168.1.100 and run iperf3 against the WAN IP from client
iperf3 --client 192.168.160.10 --no-delay --omit 1 --time 5 --parallel 4 --format m
iperf3 --client 192.168.160.10 --no-delay --omit 1 --time 5 --parallel 4 --format m --reverse

pfSense, OPN 24.10.2, OPN 25.1.1 all showed ~9.45 Gbit/sec in both directions.

wireguard results

  • pfSense
    • upload: 7.6 Gbit/sec
    • download: 6.6 Git/sec
  • OPN 24.10.2
    • upload: 8.0 Gbit/sec
    • download: 543 Mbit/sec
  • OPN 25.1.1
    • upload: 8.0 Gbit/sec
    • download: 538 Mbit/sec



if i disable pf via
pfctl -d and re-run the iperf3 commands but still going through the wireguard interfaces

  • OPN 25.1.1
    • upload: 8.1 Gbit/sec
    • download: 7.5 Gbit/sec

For this test the CPUs are nearly 100% in use with the kernel threads and wireguard threads. This is basically the CPU bound max of the 2278g setup.



pfSense and OPN are setup with a wireguard interface and have a single firewall rule: allow from wg0 net to any.

i have tried various MTUs, outbound NAT rules, and firewall normalization rules suggested in the documentation or on these forums. those extra rules made no difference whatsoever to the results. the reverse iperf3 direction is always stuck ~ 500 Mbits and the CPUs are nearly 100% idle during transmission.

no matter which CPU/nic i used in the router, e-2278g, e-2414, G7400, n305, i3-14000t, i5-13400t, --reverse direction is always ~ 500 Mbit/sec. some kernel level delay blocking CPU from working as hard as it can.



the installs and setups are as out-of-box as possible. pfSense i had to install the wireguard package. OPNsense i install the cpu-microcode-intel package.

i have gone through various tunables, but none make any real impact. 8 Gbps vs 500 Mbps isn't going to be tweaked, unless its some sort of bug that can be worked around

i have also tried many other machines as noted earlier for client, router, server. i also put in my AMD 7950X3D windows 11 desktop into the mix as client and server. there is no difference in behavior whatsoever.



i have setup a OpenVPN tunnel via the OPNsense OpenVPN road warrior documentation and i get the same behavior. so its not wireguard. i did not try an IPsec tunnel.

i also diff'd the output of sysctl -a of pfSense 2.7.2 and OPNsense 25.1.1 and saw no real meaningful differences.

Hmm, I didn't get a chance to try this out yesterday and I'm not going to be able to for another couple of days.

This said, something occurred to me overnight.
You mention a static IP on the WAN side and a directly attached machine. So no gateway? Or gateway set to the attached machine?
I ask because I've been bit by a fairly obscure setting (Firewall > Settings > Advanced > Disable reply-to) in the past that affects how WAN traffic is routed.
By default, all out traffic on the WAN is directed at the gateway for that network, which is back holed if the gateway is another firewall.

Maybe it's irrelevant but I wonder how that "feature" interacts with your test environment.
You might be better off adding a switch, having an actual gateway and disabling reply-to.

yea i have run with an actual gateway as well, adding the WAN to my normal network, i do this initially so that i can install any needed packages and say update 25.1 to 25.1.1.

in either case, there is no difference in behavior. i tried eliminating the external network after a few days to try and isolate it more, but the results are exactly the same either way, unfortunately

Poor speed or disconnects, usually MTU is wrong, set too high at either side of the tunnel

Well, if you ignore the official instructions and do not set MSS clamping for Wireguard, then, yes, of course...
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

yea i followed the documentation, verified mtu, with or without the firewall normalization rule, tried messing with MTUs on both sides. there is no difference.

i also verified the out-of-box MTUs are the same as pfSense and OpenWRT according to ifconfig

Quote from: dirtyfreebooter on February 17, 2025, 07:44:20 PMyea i have run with an actual gateway as well, adding the WAN to my normal network, i do this initially so that i can install any needed packages and say update 25.1 to 25.1.1.

in either case, there is no difference in behavior. i tried eliminating the external network after a few days to try and isolate it more, but the results are exactly the same either way, unfortunately
And reply-to was disabled on the OPN being tested, right?
Otherwise, all reply traffic between OPN and the desktop on the WAN side bounces via the WAN gateway.

I managed to run enough of a test to look at states and traffic (for my education).
It's actually pretty darn simple. Simple UDP tunnel between the wireguard client and OPN WAN + one TCP connection per iperf thread from client's wireguard IP and target machine.
It's no surprise loads are negligible when traffic is choked up. I have no idea where the bottleneck could be.
I don't know that a packet capture would reveal anything.

FWIW, my test environment was way worse than yours (yet sufficient for my investigation):
Ubuntu VM on my prod N305 based proxmox (where my OPN also lives), in a separate VLAN so I don't need to deal with reply-to -> main OPN for inter-VLAN -> managed switch -> OPN with Wireguard (also virtualized on N100 hardware) -> unmanaged switch -> target Ubuntu.

I still managed to get 900Mbps in both directions, very symmetrical... It's mindboggling you can't exceed 550Mbps on your hardware.

O-M-G ** SOLVED ** THANK YOU eric!!

it was the reply-to... changed it to

and immediately all OPN installs worked in both directions... from 23.1 to 25.1.1 on all my hardware setups...

The Intel X E-2278G with X710-DA2

Up: 8.60 Gbits/sec
Down: 7.27 Gbits/sec

iperf3 --client 192.168.1.20 --omit 1 --time 5 --parallel 16 --format g
...
[SUM]   0.00-5.00   sec  5.00 GBytes  8.59 Gbits/sec  3327             sender
[SUM]   0.00-5.00   sec  5.00 GBytes  8.60 Gbits/sec                  receiver

iperf3 --client 192.168.1.20 --omit 1 --time 5 --parallel 16 --format g --reverse
Reverse mode, remote host 192.168.1.20 is sending
...
[SUM]   0.00-5.00   sec  4.23 GBytes  7.27 Gbits/sec  7211             sender
[SUM]   0.00-5.00   sec  4.23 GBytes  7.27 Gbits/sec                  receiver

i turned off the firewall normalization rule the Road Warrior Docs say to use and now i consistently get

Up:[SUM]   0.00-5.00   sec  5.23 GBytes  8.98 Gbits/sec   34             sender
[SUM]   0.00-5.00   sec  5.23 GBytes  8.98 Gbits/sec                  receiver

Down:[SUM]   0.00-5.00   sec  4.62 GBytes  7.93 Gbits/sec  5610             sender
[SUM]   0.00-5.00   sec  4.62 GBytes  7.93 Gbits/sec                  receiver

Nice!

I kinda liked that theory because it explained the discrepancy, but it was arguably just a theory until you verified it.

I don't know what the gateway was but it was likely not multi-gig.
The downgrade all the way to ~550 might have been caused by collisions.

As to what was going on with the client directly connected is still unknown.
But that's such an atypical use case that it's not worth investigating further.
Sometimes simplifying the test bench to an extreme has unpredictable side effects.

When I got bit by this setting, no traffic went through because my main OPN rejected it (state violation) since it was reply traffic to requests that it never saw. That's usually easier to troubleshoot than performance issues... Especially since I had FW logs.

This was an interesting thread, but I have a question about the "reply-to" setting.

I have only one physical WAN interface, but I use one WG0 for incoming Wireguard clients and I have 2 other interfaces that handle outgoing VPN (OpenVPN and Wireguard) connections via separate gateways.
I guess this means I should also disable "reply-to" on WAN rules, correct?