Wireguard Speed Issue ** SOLVED **

Started by dirtyfreebooter, February 10, 2025, 06:50:36 AM

Previous topic - Next topic
February 10, 2025, 06:50:36 AM Last Edit: February 21, 2025, 12:42:25 AM by dirtyfreebooter
** SOLVED ** it was the firewall setting, Disable Reply-To. By default its unchecked. After checking it, all my test setups immediately went to full speed in both directions.





i am seeing a weird issue when with wireguard speeds going out of the firewall, but not in.

client (Debian 12 i7-13700 with Mellanox Connect-X5 25g NIC) -> opnsense -> server (Windows 11 AMD 7950X3d with Intel E810 25g NIC)
No internet, this is a local test

the test run on the client is:

iperf3 --client <server ip> --no-delay --parallel 8
iperf3 --client <server ip> --no-delay --parallel 8 --reverse

every time i do the --reverse test, i can't seem to get any faster than 545 Mbits/sec.

i have now tested this with 6 different nics and 3 versions of opnsense on 3 different setups.

Setup 1:
Supermicro X11SCL-iF with Intel Xeon E 2278g (8c/16t 5Ghz)

NICs tested:
  • Intel i210 (motherboard nics using igb driver)
  • Intel i350-t4 (igb driver)
  • Intel i225V-b3 (qnap 2.5g x4 card using the igc driver)
  • Intel x520-da2 (ix driver)
  • Intel x710-da2 (ixl driver)
  • Mellanox Connect-X3 (mce driver)
  • Mellanox Connect-X5 (mce driver)

Setup 2:
Lenovo P3 Tiny with Intel i3 14100t (4c/8t 4.4Ghz)

NICs tested:
  • Intel i350-t4 (igb driver)
  • Intel x710-da2 (ixl driver)

Setup 3:
Odroid H4 Ultra with Intel N-305 (8c/8c 3.8Ghz)

NICs tested:
  • Intel i226V (igc driver)

All of these systems have enough CPU on OPNsense to do more than 500 Mbit/sec with Wireguard.

I have tried each with system with 3 versions of OPNsense:
  • 24.7.1
  • 24.10.2 (business edition)
  • 25.1

this is a vanilla install. install cpu-microcode-intel. configure wireguard via the road warrior instructions. nothing else is added or configured on the systems.

Results

  • with the 1G and 2.5G nics, going in the upload direction i was able to get full line speed. going in download direction (--reverse), 545 Mbit/sec
  • with the 10G and 25G nics, upload directions i was able to achieve between 4-7 Gbit/sec. going in download direction (--reverse), 545 Mbit/sec

Looking at top when this is happening, the CPUs on all 3 systems, all 3 versions of OPNsense are basically idle. using maybe 5%-10% cpu max.

so all systems, no matter what i change on the iperf3 side, etc, all stuck at 545 Mbit/sec in that one direction. I have tried various ethernet cables, OM3 fiber, DAC cables. its all the same, 545 Mbit/sec.

MTUs on the systems are 1500 for NIC and 1420 for wireguard interfaces. upload direction is great performance. download 100% terrible all the time, 100% reproducible. from 24.7.1 to 25.1.

if i don't go through the wireguard interface, if i setup a NAT port forward, i get full speeds in both directions.

this must be some error or bad configuration on my part?

Sounds like the wireguard encryption is slower than the decryption. There may also be an issue with the wireguard process being restricted to one thread. The protocol supports it but OPNsense may not: https://www.wireguard.com/performance/

the CPUs are all nearly idle when this is happening, the 545 Mbit/sec. i can see the wireguard kernel threads in top, 1 per cpu instances and they are all doing 0.5% - 1%. in the other direction the cpu usage scales with the NIC i am using, 1g/2.5g/10g/25g.

i guess its possible there is some sort of bug, but i tried 24.7.1 - 25.1 and same results, so no one noticed that for more than a year?

which is why it all seems like it has to be some sort of configuration error on my part...

because i hate myself, lol, i went and installed pfSense CE 2.7.2 on the supermicro x11scl-if with Xeon 2278g. using the i226V 4 port NIC, installed pfSense. installed the wireguard package. configured wireguard, added a single peer and connected and immediately was able to do 2.5g up and down.

iperf3 --client 192.168.1.103 --omit 1 --time 5 --parallel 4 -f g
Connecting to host 192.168.1.103, port 5201
[  5] local 192.168.40.2 port 35428 connected to 192.168.1.103 port 5201
[  7] local 192.168.40.2 port 35430 connected to 192.168.1.103 port 5201
[  9] local 192.168.40.2 port 35446 connected to 192.168.1.103 port 5201
[ 11] local 192.168.40.2 port 35454 connected to 192.168.1.103 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  73.7 MBytes  0.62 Gbits/sec    0    521 KBytes       (omitted)
[  7]   0.00-1.00   sec  74.0 MBytes  0.62 Gbits/sec    0    524 KBytes       (omitted)
[  9]   0.00-1.00   sec  65.4 MBytes  0.55 Gbits/sec    0    450 KBytes       (omitted)
[ 11]   0.00-1.00   sec  64.2 MBytes  0.54 Gbits/sec    0    468 KBytes       (omitted)
[SUM]   0.00-1.00   sec   277 MBytes  2.33 Gbits/sec    0             (omitted)
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   0.00-1.00   sec  68.4 MBytes  0.57 Gbits/sec    0    546 KBytes
[  7]   0.00-1.00   sec  68.2 MBytes  0.57 Gbits/sec    0    549 KBytes
[  9]   0.00-1.00   sec  65.7 MBytes  0.55 Gbits/sec    0    450 KBytes
[ 11]   0.00-1.00   sec  66.0 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   0.00-1.00   sec   268 MBytes  2.25 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  68.5 MBytes  0.57 Gbits/sec    0    546 KBytes
[  7]   1.00-2.00   sec  68.6 MBytes  0.58 Gbits/sec    0    549 KBytes
[  9]   1.00-2.00   sec  65.7 MBytes  0.55 Gbits/sec    0    450 KBytes
[ 11]   1.00-2.00   sec  65.9 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   1.00-2.00   sec   269 MBytes  2.25 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  67.3 MBytes  0.56 Gbits/sec    0    546 KBytes
[  7]   2.00-3.00   sec  69.1 MBytes  0.58 Gbits/sec    0    573 KBytes
[  9]   2.00-3.00   sec  64.8 MBytes  0.54 Gbits/sec    0    450 KBytes
[ 11]   2.00-3.00   sec  65.9 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   2.00-3.00   sec   267 MBytes  2.24 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  68.4 MBytes  0.57 Gbits/sec    0    546 KBytes
[  7]   3.00-4.00   sec  70.2 MBytes  0.59 Gbits/sec    0    573 KBytes
[  9]   3.00-4.00   sec  65.8 MBytes  0.55 Gbits/sec    0    470 KBytes
[ 11]   3.00-4.00   sec  65.9 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   3.00-4.00   sec   270 MBytes  2.27 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  68.6 MBytes  0.58 Gbits/sec    0    546 KBytes
[  7]   4.00-5.00   sec  69.1 MBytes  0.58 Gbits/sec    0    573 KBytes
[  9]   4.00-5.00   sec  66.3 MBytes  0.56 Gbits/sec    0    470 KBytes
[ 11]   4.00-5.00   sec  65.0 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   4.00-5.00   sec   269 MBytes  2.26 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.00   sec   341 MBytes  0.57 Gbits/sec    0             sender
[  5]   0.00-5.00   sec   342 MBytes  0.57 Gbits/sec                  receiver
[  7]   0.00-5.00   sec   345 MBytes  0.58 Gbits/sec    0             sender
[  7]   0.00-5.00   sec   345 MBytes  0.58 Gbits/sec                  receiver
[  9]   0.00-5.00   sec   328 MBytes  0.55 Gbits/sec    0             sender
[  9]   0.00-5.00   sec   328 MBytes  0.55 Gbits/sec                  receiver
[ 11]   0.00-5.00   sec   329 MBytes  0.55 Gbits/sec    0             sender
[ 11]   0.00-5.00   sec   329 MBytes  0.55 Gbits/sec                  receiver
[SUM]   0.00-5.00   sec  1.31 GBytes  2.25 Gbits/sec    0             sender
[SUM]   0.00-5.00   sec  1.31 GBytes  2.25 Gbits/sec                  receiver

and reverse

iperf3 --client 192.168.1.103 --omit 1 --time 5 --parallel 4 -f g -R
Connecting to host 192.168.1.103, port 5201
Reverse mode, remote host 192.168.1.103 is sending
[  5] local 192.168.40.2 port 35472 connected to 192.168.1.103 port 5201
[  7] local 192.168.40.2 port 35480 connected to 192.168.1.103 port 5201
[  9] local 192.168.40.2 port 42704 connected to 192.168.1.103 port 5201
[ 11] local 192.168.40.2 port 42706 connected to 192.168.1.103 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  24.6 MBytes  0.21 Gbits/sec                  (omitted)
[  7]   0.00-1.00   sec  66.9 MBytes  0.56 Gbits/sec                  (omitted)
[  9]   0.00-1.00   sec   130 MBytes  1.09 Gbits/sec                  (omitted)
[ 11]   0.00-1.00   sec  45.6 MBytes  0.38 Gbits/sec                  (omitted)
[SUM]   0.00-1.00   sec   267 MBytes  2.24 Gbits/sec                  (omitted)
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   0.00-1.00   sec  29.2 MBytes  0.24 Gbits/sec
[  7]   0.00-1.00   sec  63.2 MBytes  0.53 Gbits/sec
[  9]   0.00-1.00   sec   131 MBytes  1.10 Gbits/sec
[ 11]   0.00-1.00   sec  45.2 MBytes  0.38 Gbits/sec
[SUM]   0.00-1.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  31.9 MBytes  0.27 Gbits/sec
[  7]   1.00-2.00   sec  61.6 MBytes  0.52 Gbits/sec
[  9]   1.00-2.00   sec   129 MBytes  1.08 Gbits/sec
[ 11]   1.00-2.00   sec  46.5 MBytes  0.39 Gbits/sec
[SUM]   1.00-2.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  35.9 MBytes  0.30 Gbits/sec
[  7]   2.00-3.00   sec  57.8 MBytes  0.49 Gbits/sec
[  9]   2.00-3.00   sec   130 MBytes  1.09 Gbits/sec
[ 11]   2.00-3.00   sec  44.6 MBytes  0.37 Gbits/sec
[SUM]   2.00-3.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  41.1 MBytes  0.34 Gbits/sec
[  7]   3.00-4.00   sec  51.4 MBytes  0.43 Gbits/sec
[  9]   3.00-4.00   sec   136 MBytes  1.14 Gbits/sec
[ 11]   3.00-4.00   sec  39.7 MBytes  0.33 Gbits/sec
[SUM]   3.00-4.00   sec   268 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  44.2 MBytes  0.37 Gbits/sec
[  7]   4.00-5.00   sec  54.4 MBytes  0.46 Gbits/sec
[  9]   4.00-5.00   sec   134 MBytes  1.12 Gbits/sec
[ 11]   4.00-5.00   sec  36.4 MBytes  0.31 Gbits/sec
[SUM]   4.00-5.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.00   sec   183 MBytes  0.31 Gbits/sec    0             sender
[  5]   0.00-5.00   sec   182 MBytes  0.31 Gbits/sec                  receiver
[  7]   0.00-5.00   sec   289 MBytes  0.48 Gbits/sec    1             sender
[  7]   0.00-5.00   sec   288 MBytes  0.48 Gbits/sec                  receiver
[  9]   0.00-5.00   sec   660 MBytes  1.11 Gbits/sec    0             sender
[  9]   0.00-5.00   sec   660 MBytes  1.11 Gbits/sec                  receiver
[ 11]   0.00-5.00   sec   213 MBytes  0.36 Gbits/sec    3             sender
[ 11]   0.00-5.00   sec   212 MBytes  0.36 Gbits/sec                  receiver
[SUM]   0.00-5.00   sec  1.31 GBytes  2.26 Gbits/sec    4             sender
[SUM]   0.00-5.00   sec  1.31 GBytes  2.25 Gbits/sec                  receiver


CPU usage is ~ 15% per core, except the main WG core which is 40%
https://imgur.com/a/SDtQK5I

looking at the kernels for pfSense 2.7.2 CE and OPNsense 24.10.2, the wireguard implementation is nearly identical. so it seems like the OPNsense issue is somewhere else in the kernel.

since the behavior is the same for multiple NICs and drivers, igb, igc, ix, ixl, mce, it probably not a driver issue.

which leaves some sort of pf or networking issue in the OPNsense kernel.

i guess next steps are maybe to try vanilla FreeBSD and see if it also occurs.

I can reach >= 700 Mbits/s between two Linux hosts connected over Wireguard on an N100 box, so there is no principal problem.

But I need more than one connection with -R and one thread only, so: did you enable RSS?
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

when i enabled RSS on the Xeon 2278g (8c/8t with hyperthreads disabled)

net.isr.bindthreads = 1
net.isr.maxthreads = -1
net.inet.rss.enabled = 1
net.inet.rss.bits = 3

my throughput went down to ~300 Mbit/sec (but again only in the 1 direction)

these are all fresh installs of 24.7.1, 24.10.2, and 25.1 with no configuration other than intel-cpu-microcode and wireguard.

i just tested
- FreeBSD 14.2 on the supermicro xeon 2278g and running iperf3 server on freebsd and had no problems running in either direction. (so no firewalling)
- OpenWRT x86 and no problem maxing out 2.5g on the N305 system


Disable Spectre Mitigation: "sysctl hw.ibrs_disable=1", should work immediately, without reboot.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

i have tried with

vm.pmap.pti=0
hw.ibrs_disable=1

and there is no difference. on OPNsense, the CPUs are idle when traffic is flowing..

but even so, all of these setups have so much more Ghz than 500 Mbit/sec. the Xeon E-2278G is 8c/16t 5ghz with 4.3ghz all core turbo. the i3-14000T is 4c/8t with 4.3 Ghz all turbo. additionally, the other OSes, vanilla FreeBSD 14.2, pfSense 2.7.2 all have these on by default.

i have installed FreeBSD, pfSense, OPNsense 24.7.1, 24.10.2, 25.1 on separate SSDs, so at least i can switch back and forth easy to try and test things, the combination of 3 PCs, 3 CPUs, 8 different NICs and only OPNsense has an issue and only in one direction is odd.


Since it is not CPU-limited and also, several NICs show the same behavior, and also, pfSense does not show that, I can only suspect a one-directional problem with the network, like, e.g. flow control or hardware offloading? IDK if the settings differ between OSes.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

+1 yea, i am going to try and do sysctl -a on all systems and diff those and see if i can find any differences to experiment with as my next steps.

ixl driver defaults to dev.ixl.0.fc=0 and i have all the hardware offloading turned off (OPNsense default), but i have also tried with it all turned on and just checksuming turned on (pfSense default). no change whatsoever. for igb and igc drivers have tried flowcontrol on/off but no change either

February 14, 2025, 03:49:29 PM #11 Last Edit: February 14, 2025, 03:58:52 PM by joezeppy
I'm no expert and I'm curious as to why your performance is poor in just one direction.  With OPNsense 25.1.1 on a DEC850 with 10G ports, I'm getting around 2Gbps both Up/Dn single threaded.  I've tried higher -P values, but the results are about the same even though I have applied some RSS tweaks.  Here's a screen shot using my Windows PC with WireGuard activated through the firewall to an iperf instance on my NAS server (all 10G path).



My configuration is dual stack (IPv4/IPv6) and also I use a lower WireGaurd MTU of 1360 because my phone is cellular CGNAT even though my test result above is from my PC which is not using cellular:





I also have a firewall normalization entry:




root@OPNsense:~ # netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         8            8
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   4096    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6   1000    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--
Deciso DEC850v2

wow thanks for that info. that will help a lot with me being able to verify my setup.

yea, i will be able to get back to testing this weekend. it does seem like its something in my environment, but my environment is 3 directly connected PCs and i just swap out the router SSD with another OS like pfSense or OpenWRT and get full speeds.

so i have not made any progress.

going back and forth with SSDs, pfSense 2.7.2 CE is always hitting max speeds in both directions. i've tried diff'ing sysctl -a between the two systems and they are not really that different. any changes i made to make OPNsense match pfSense sysctls, made no differences whatsoever.

i shortened all the iperf3 output for display. but i have saved all the data. this is all 100% reproducible. followed the OPNsense official docs to setup Wireguard and OpenVPN.

iperf3 --client <ip> --no-delay --parallel 4 [--reverse]
Supermicro X11SCL-iF with Intel Xeon E-2278G (8c/16t 5Ghz), Intel X710-DA2 SFP+ NIC with v9.53 firmware

pfSense 2.7.2 CE

NAT Port Forward
Connecting to host 192.168.160.10, port 5201
[SUM]   0.00-5.00   sec  5.46 GBytes  9380 Mbits/sec  2010             sender
[SUM]   0.00-5.00   sec  5.46 GBytes  9376 Mbits/sec                  receiver

Reverse mode, remote host 192.168.160.10 is sending
[SUM]   0.00-5.00   sec  5.48 GBytes  9417 Mbits/sec    0             sender
[SUM]   0.00-5.00   sec  5.48 GBytes  9415 Mbits/sec                  receiver

Wireguard
Connecting to host 192.168.1.101, port 5201
[SUM]   0.00-4.00   sec  3.55 GBytes  7616 Mbits/sec  364             sender
[SUM]   0.00-4.00   sec  3.55 GBytes  7615 Mbits/sec                  receiver

Reverse mode, remote host 192.168.1.101 is sending
[SUM]   0.00-4.00   sec  3.12 GBytes  6694 Mbits/sec  200             sender
[SUM]   0.00-4.00   sec  3.12 GBytes  6692 Mbits/sec                  receiver

OPNsense 25.1.1

Wireguard
Connecting to host 192.168.1.101, port 5201
[SUM]   0.00-4.00   sec  3.76 GBytes  8077 Mbits/sec  554             sender
[SUM]   0.00-4.00   sec  3.76 GBytes  8071 Mbits/sec                  receiver

Reverse mode, remote host 192.168.1.101 is sending
[SUM]   0.00-4.00   sec   149 MBytes   312 Mbits/sec  272             sender
[SUM]   0.00-4.00   sec   142 MBytes   299 Mbits/sec                  receiver

Odroid H4 Ultra with Intel n305 (8c), Intel i226V 2.5g NIC

OPNsense 25.1.1

Wireguard
Connecting to host 192.168.1.101, port 5201
[SUM]   0.00-4.00   sec  1.04 GBytes  2229 Mbits/sec  303             sender
[SUM]   0.00-4.00   sec  1.04 GBytes  2228 Mbits/sec                  receiver

Reverse mode, remote host 192.168.1.101 is sending
[SUM]   0.00-4.00   sec   248 MBytes   519 Mbits/sec  1157             sender
[SUM]   0.00-4.00   sec   241 MBytes   506 Mbits/sec                  receiver

at a loss for what to try next, i setup OpenVPN with DCO on the Odroid H4 Ultra.

OpenVPN (DCO)
Connecting to host 192.168.1.101, port 5201
[SUM]   0.00-5.00   sec  1.32 GBytes  2266 Mbits/sec    0             sender
[SUM]   0.00-5.04   sec  1.33 GBytes  2268 Mbits/sec                  receiver

Reverse mode, remote host 192.168.1.101 is sending
[SUM]   0.00-5.00   sec   329 MBytes   551 Mbits/sec    0             sender
[SUM]   0.00-5.00   sec   327 MBytes   548 Mbits/sec                  receiver


So NAT port forwarding tests so line rate, 10g or 2.5g from both pfSense and OPNsense.

pfSense CE with wireguard shows 7.6 gbit/sec and 6.6 gbit/sec.

OPNsense with wireguard shows 8.0 gbit/sec (2.2 gbit/sec on i226v) and 300-500 mbit/sec (messing with MTU/MSS/normalization rules only reduces throughput)

OPNsense with openvpn (i226v) shows 2.2 gbit/sec and 548 mbit/sec

using top, when systems are doing 500 mbit/sec the cpus are idle. the power draw from the outlet is even at the idle power watts. there must be some sort of kernel lock in OPNsense??



VPN provider doesn't matter. Wireguard or OpenVPN shows the same issues. reverse direction is locked to 500 mbit/sec. these are vanilla / new installs. same systems. i am just literally swapping the SSDs and rebooting between pfSense and OPNsense and getting 100% reproducible results.

i don't know what to do next. i have tried multiple systems (for clients and servers), but seemingly i the only person seeing this?

1 reddit post seeing similar behavior: https://www.reddit.com/r/opnsense/comments/1gwzkye/opnsense_and_wireguard_why_is_wireguard_limited/ but user give up and bought pfSense plus.

Using the router that gives you asymmetrical results, I would physically swap the client and server.
You have the choice of initiating the test from either end.

OPNsense actually seems to have better results that pFsense in one direction...

Your results indicate a bad interaction between one machine and the side of the router it is connected to.
I'd be looking for low level statistics on retries.