Wireguard Speed Issue

Started by dirtyfreebooter, February 10, 2025, 06:50:36 AM

Previous topic - Next topic
February 10, 2025, 06:50:36 AM Last Edit: February 10, 2025, 07:04:22 AM by dirtyfreebooter
i am seeing a weird issue when with wireguard speeds going out of the firewall, but not in.

client (Debian 12 i7-13700 with Mellanox Connect-X5 25g NIC) -> opnsense -> server (Windows 11 AMD 7950X3d with Intel E810 25g NIC)
No internet, this is a local test

the test run on the client is:

iperf3 --client <server ip> --no-delay --parallel 8
iperf3 --client <server ip> --no-delay --parallel 8 --reverse

every time i do the --reverse test, i can't seem to get any faster than 545 Mbits/sec.

i have now tested this with 6 different nics and 3 versions of opnsense on 3 different setups.

Setup 1:
Supermicro X11SCL-iF with Intel Xeon E 2278g (8c/16t 5Ghz)

NICs tested:
  • Intel i210 (motherboard nics using igb driver)
  • Intel i350-t4 (igb driver)
  • Intel i225V-b3 (qnap 2.5g x4 card using the igc driver)
  • Intel x520-da2 (ix driver)
  • Intel x710-da2 (ixl driver)
  • Mellanox Connect-X3 (mce driver)
  • Mellanox Connect-X5 (mce driver)

Setup 2:
Lenovo P3 Tiny with Intel i3 14100t (4c/8t 4.4Ghz)

NICs tested:
  • Intel i350-t4 (igb driver)
  • Intel x710-da2 (ixl driver)

Setup 3:
Odroid H4 Ultra with Intel N-305 (8c/8c 3.8Ghz)

NICs tested:
  • Intel i226V (igc driver)

All of these systems have enough CPU on OPNsense to do more than 500 Mbit/sec with Wireguard.

I have tried each with system with 3 versions of OPNsense:
  • 24.7.1
  • 24.10.2 (business edition)
  • 25.1

this is a vanilla install. install cpu-microcode-intel. configure wireguard via the road warrior instructions. nothing else is added or configured on the systems.

Results

  • with the 1G and 2.5G nics, going in the upload direction i was able to get full line speed. going in download direction (--reverse), 545 Mbit/sec
  • with the 10G and 25G nics, upload directions i was able to achieve between 4-7 Gbit/sec. going in download direction (--reverse), 545 Mbit/sec

Looking at top when this is happening, the CPUs on all 3 systems, all 3 versions of OPNsense are basically idle. using maybe 5%-10% cpu max.

so all systems, no matter what i change on the iperf3 side, etc, all stuck at 545 Mbit/sec in that one direction. I have tried various ethernet cables, OM3 fiber, DAC cables. its all the same, 545 Mbit/sec.

MTUs on the systems are 1500 for NIC and 1420 for wireguard interfaces. upload direction is great performance. download 100% terrible all the time, 100% reproducible. from 24.7.1 to 25.1.

if i don't go through the wireguard interface, if i setup a NAT port forward, i get full speeds in both directions.

this must be some error or bad configuration on my part?

Sounds like the wireguard encryption is slower than the decryption. There may also be an issue with the wireguard process being restricted to one thread. The protocol supports it but OPNsense may not: https://www.wireguard.com/performance/

the CPUs are all nearly idle when this is happening, the 545 Mbit/sec. i can see the wireguard kernel threads in top, 1 per cpu instances and they are all doing 0.5% - 1%. in the other direction the cpu usage scales with the NIC i am using, 1g/2.5g/10g/25g.

i guess its possible there is some sort of bug, but i tried 24.7.1 - 25.1 and same results, so no one noticed that for more than a year?

which is why it all seems like it has to be some sort of configuration error on my part...

because i hate myself, lol, i went and installed pfSense CE 2.7.2 on the supermicro x11scl-if with Xeon 2278g. using the i226V 4 port NIC, installed pfSense. installed the wireguard package. configured wireguard, added a single peer and connected and immediately was able to do 2.5g up and down.

iperf3 --client 192.168.1.103 --omit 1 --time 5 --parallel 4 -f g
Connecting to host 192.168.1.103, port 5201
[  5] local 192.168.40.2 port 35428 connected to 192.168.1.103 port 5201
[  7] local 192.168.40.2 port 35430 connected to 192.168.1.103 port 5201
[  9] local 192.168.40.2 port 35446 connected to 192.168.1.103 port 5201
[ 11] local 192.168.40.2 port 35454 connected to 192.168.1.103 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  73.7 MBytes  0.62 Gbits/sec    0    521 KBytes       (omitted)
[  7]   0.00-1.00   sec  74.0 MBytes  0.62 Gbits/sec    0    524 KBytes       (omitted)
[  9]   0.00-1.00   sec  65.4 MBytes  0.55 Gbits/sec    0    450 KBytes       (omitted)
[ 11]   0.00-1.00   sec  64.2 MBytes  0.54 Gbits/sec    0    468 KBytes       (omitted)
[SUM]   0.00-1.00   sec   277 MBytes  2.33 Gbits/sec    0             (omitted)
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   0.00-1.00   sec  68.4 MBytes  0.57 Gbits/sec    0    546 KBytes
[  7]   0.00-1.00   sec  68.2 MBytes  0.57 Gbits/sec    0    549 KBytes
[  9]   0.00-1.00   sec  65.7 MBytes  0.55 Gbits/sec    0    450 KBytes
[ 11]   0.00-1.00   sec  66.0 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   0.00-1.00   sec   268 MBytes  2.25 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  68.5 MBytes  0.57 Gbits/sec    0    546 KBytes
[  7]   1.00-2.00   sec  68.6 MBytes  0.58 Gbits/sec    0    549 KBytes
[  9]   1.00-2.00   sec  65.7 MBytes  0.55 Gbits/sec    0    450 KBytes
[ 11]   1.00-2.00   sec  65.9 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   1.00-2.00   sec   269 MBytes  2.25 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  67.3 MBytes  0.56 Gbits/sec    0    546 KBytes
[  7]   2.00-3.00   sec  69.1 MBytes  0.58 Gbits/sec    0    573 KBytes
[  9]   2.00-3.00   sec  64.8 MBytes  0.54 Gbits/sec    0    450 KBytes
[ 11]   2.00-3.00   sec  65.9 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   2.00-3.00   sec   267 MBytes  2.24 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  68.4 MBytes  0.57 Gbits/sec    0    546 KBytes
[  7]   3.00-4.00   sec  70.2 MBytes  0.59 Gbits/sec    0    573 KBytes
[  9]   3.00-4.00   sec  65.8 MBytes  0.55 Gbits/sec    0    470 KBytes
[ 11]   3.00-4.00   sec  65.9 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   3.00-4.00   sec   270 MBytes  2.27 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  68.6 MBytes  0.58 Gbits/sec    0    546 KBytes
[  7]   4.00-5.00   sec  69.1 MBytes  0.58 Gbits/sec    0    573 KBytes
[  9]   4.00-5.00   sec  66.3 MBytes  0.56 Gbits/sec    0    470 KBytes
[ 11]   4.00-5.00   sec  65.0 MBytes  0.55 Gbits/sec    0    468 KBytes
[SUM]   4.00-5.00   sec   269 MBytes  2.26 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.00   sec   341 MBytes  0.57 Gbits/sec    0             sender
[  5]   0.00-5.00   sec   342 MBytes  0.57 Gbits/sec                  receiver
[  7]   0.00-5.00   sec   345 MBytes  0.58 Gbits/sec    0             sender
[  7]   0.00-5.00   sec   345 MBytes  0.58 Gbits/sec                  receiver
[  9]   0.00-5.00   sec   328 MBytes  0.55 Gbits/sec    0             sender
[  9]   0.00-5.00   sec   328 MBytes  0.55 Gbits/sec                  receiver
[ 11]   0.00-5.00   sec   329 MBytes  0.55 Gbits/sec    0             sender
[ 11]   0.00-5.00   sec   329 MBytes  0.55 Gbits/sec                  receiver
[SUM]   0.00-5.00   sec  1.31 GBytes  2.25 Gbits/sec    0             sender
[SUM]   0.00-5.00   sec  1.31 GBytes  2.25 Gbits/sec                  receiver

and reverse

iperf3 --client 192.168.1.103 --omit 1 --time 5 --parallel 4 -f g -R
Connecting to host 192.168.1.103, port 5201
Reverse mode, remote host 192.168.1.103 is sending
[  5] local 192.168.40.2 port 35472 connected to 192.168.1.103 port 5201
[  7] local 192.168.40.2 port 35480 connected to 192.168.1.103 port 5201
[  9] local 192.168.40.2 port 42704 connected to 192.168.1.103 port 5201
[ 11] local 192.168.40.2 port 42706 connected to 192.168.1.103 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  24.6 MBytes  0.21 Gbits/sec                  (omitted)
[  7]   0.00-1.00   sec  66.9 MBytes  0.56 Gbits/sec                  (omitted)
[  9]   0.00-1.00   sec   130 MBytes  1.09 Gbits/sec                  (omitted)
[ 11]   0.00-1.00   sec  45.6 MBytes  0.38 Gbits/sec                  (omitted)
[SUM]   0.00-1.00   sec   267 MBytes  2.24 Gbits/sec                  (omitted)
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   0.00-1.00   sec  29.2 MBytes  0.24 Gbits/sec
[  7]   0.00-1.00   sec  63.2 MBytes  0.53 Gbits/sec
[  9]   0.00-1.00   sec   131 MBytes  1.10 Gbits/sec
[ 11]   0.00-1.00   sec  45.2 MBytes  0.38 Gbits/sec
[SUM]   0.00-1.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  31.9 MBytes  0.27 Gbits/sec
[  7]   1.00-2.00   sec  61.6 MBytes  0.52 Gbits/sec
[  9]   1.00-2.00   sec   129 MBytes  1.08 Gbits/sec
[ 11]   1.00-2.00   sec  46.5 MBytes  0.39 Gbits/sec
[SUM]   1.00-2.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  35.9 MBytes  0.30 Gbits/sec
[  7]   2.00-3.00   sec  57.8 MBytes  0.49 Gbits/sec
[  9]   2.00-3.00   sec   130 MBytes  1.09 Gbits/sec
[ 11]   2.00-3.00   sec  44.6 MBytes  0.37 Gbits/sec
[SUM]   2.00-3.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  41.1 MBytes  0.34 Gbits/sec
[  7]   3.00-4.00   sec  51.4 MBytes  0.43 Gbits/sec
[  9]   3.00-4.00   sec   136 MBytes  1.14 Gbits/sec
[ 11]   3.00-4.00   sec  39.7 MBytes  0.33 Gbits/sec
[SUM]   3.00-4.00   sec   268 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  44.2 MBytes  0.37 Gbits/sec
[  7]   4.00-5.00   sec  54.4 MBytes  0.46 Gbits/sec
[  9]   4.00-5.00   sec   134 MBytes  1.12 Gbits/sec
[ 11]   4.00-5.00   sec  36.4 MBytes  0.31 Gbits/sec
[SUM]   4.00-5.00   sec   269 MBytes  2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.00   sec   183 MBytes  0.31 Gbits/sec    0             sender
[  5]   0.00-5.00   sec   182 MBytes  0.31 Gbits/sec                  receiver
[  7]   0.00-5.00   sec   289 MBytes  0.48 Gbits/sec    1             sender
[  7]   0.00-5.00   sec   288 MBytes  0.48 Gbits/sec                  receiver
[  9]   0.00-5.00   sec   660 MBytes  1.11 Gbits/sec    0             sender
[  9]   0.00-5.00   sec   660 MBytes  1.11 Gbits/sec                  receiver
[ 11]   0.00-5.00   sec   213 MBytes  0.36 Gbits/sec    3             sender
[ 11]   0.00-5.00   sec   212 MBytes  0.36 Gbits/sec                  receiver
[SUM]   0.00-5.00   sec  1.31 GBytes  2.26 Gbits/sec    4             sender
[SUM]   0.00-5.00   sec  1.31 GBytes  2.25 Gbits/sec                  receiver


CPU usage is ~ 15% per core, except the main WG core which is 40%
https://imgur.com/a/SDtQK5I

looking at the kernels for pfSense 2.7.2 CE and OPNsense 24.10.2, the wireguard implementation is nearly identical. so it seems like the OPNsense issue is somewhere else in the kernel.

since the behavior is the same for multiple NICs and drivers, igb, igc, ix, ixl, mce, it probably not a driver issue.

which leaves some sort of pf or networking issue in the OPNsense kernel.

i guess next steps are maybe to try vanilla FreeBSD and see if it also occurs.

I can reach >= 700 Mbits/s between two Linux hosts connected over Wireguard on an N100 box, so there is no principal problem.

But I need more than one connection with -R and one thread only, so: did you enable RSS?
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

when i enabled RSS on the Xeon 2278g (8c/8t with hyperthreads disabled)

net.isr.bindthreads = 1
net.isr.maxthreads = -1
net.inet.rss.enabled = 1
net.inet.rss.bits = 3

my throughput went down to ~300 Mbit/sec (but again only in the 1 direction)

these are all fresh installs of 24.7.1, 24.10.2, and 25.1 with no configuration other than intel-cpu-microcode and wireguard.

i just tested
- FreeBSD 14.2 on the supermicro xeon 2278g and running iperf3 server on freebsd and had no problems running in either direction. (so no firewalling)
- OpenWRT x86 and no problem maxing out 2.5g on the N305 system