** SOLVED ** it was the firewall setting, Disable Reply-To. By default its unchecked. After checking it, all my test setups immediately went to full speed in both directions.
(https://i.imgur.com/LkEHFk0.png)
i am seeing a weird issue when with wireguard speeds going out of the firewall, but not in.
client (Debian 12 i7-13700 with Mellanox Connect-X5 25g NIC) -> opnsense -> server (Windows 11 AMD 7950X3d with Intel E810 25g NIC)
No internet, this is a local test
the test run on the client is:
iperf3 --client <server ip> --no-delay --parallel 8
iperf3 --client <server ip> --no-delay --parallel 8 --reverse
every time i do the --reverse test, i can't seem to get any faster than 545 Mbits/sec.
i have now tested this with 6 different nics and 3 versions of opnsense on 3 different setups.
Setup 1:
Supermicro X11SCL-iF with Intel Xeon E 2278g (8c/16t 5Ghz)
NICs tested:
- Intel i210 (motherboard nics using igb driver)
- Intel i350-t4 (igb driver)
- Intel i225V-b3 (qnap 2.5g x4 card using the igc driver)
- Intel x520-da2 (ix driver)
- Intel x710-da2 (ixl driver)
- Mellanox Connect-X3 (mce driver)
- Mellanox Connect-X5 (mce driver)
Setup 2:
Lenovo P3 Tiny with Intel i3 14100t (4c/8t 4.4Ghz)
NICs tested:
- Intel i350-t4 (igb driver)
- Intel x710-da2 (ixl driver)
Setup 3:
Odroid H4 Ultra with Intel N-305 (8c/8c 3.8Ghz)
NICs tested:
All of these systems have enough CPU on OPNsense to do more than 500 Mbit/sec with Wireguard.
I have tried each with system with 3 versions of OPNsense:
- 24.7.1
- 24.10.2 (business edition)
- 25.1
this is a vanilla install. install cpu-microcode-intel. configure wireguard via the road warrior instructions (https://docs.opnsense.org/manual/how-tos/wireguard-client.html). nothing else is added or configured on the systems.
Results- with the 1G and 2.5G nics, going in the upload direction i was able to get full line speed. going in download direction (--reverse), 545 Mbit/sec
- with the 10G and 25G nics, upload directions i was able to achieve between 4-7 Gbit/sec. going in download direction (--reverse), 545 Mbit/sec
Looking at top when this is happening, the CPUs on all 3 systems, all 3 versions of OPNsense are basically idle. using maybe 5%-10% cpu max.
so all systems, no matter what i change on the iperf3 side, etc, all stuck at 545 Mbit/sec in that one direction. I have tried various ethernet cables, OM3 fiber, DAC cables. its all the same, 545 Mbit/sec.
MTUs on the systems are 1500 for NIC and 1420 for wireguard interfaces. upload direction is great performance. download 100% terrible all the time, 100% reproducible. from 24.7.1 to 25.1.
if i don't go through the wireguard interface, if i setup a NAT port forward, i get full speeds in both directions.
this must be some error or bad configuration on my part?
Sounds like the wireguard encryption is slower than the decryption. There may also be an issue with the wireguard process being restricted to one thread. The protocol supports it but OPNsense may not: https://www.wireguard.com/performance/
the CPUs are all nearly idle when this is happening, the 545 Mbit/sec. i can see the wireguard kernel threads in top, 1 per cpu instances and they are all doing 0.5% - 1%. in the other direction the cpu usage scales with the NIC i am using, 1g/2.5g/10g/25g.
i guess its possible there is some sort of bug, but i tried 24.7.1 - 25.1 and same results, so no one noticed that for more than a year?
which is why it all seems like it has to be some sort of configuration error on my part...
because i hate myself, lol, i went and installed pfSense CE 2.7.2 on the supermicro x11scl-if with Xeon 2278g. using the i226V 4 port NIC, installed pfSense. installed the wireguard package. configured wireguard, added a single peer and connected and immediately was able to do 2.5g up and down.
iperf3 --client 192.168.1.103 --omit 1 --time 5 --parallel 4 -f g
Connecting to host 192.168.1.103, port 5201
[ 5] local 192.168.40.2 port 35428 connected to 192.168.1.103 port 5201
[ 7] local 192.168.40.2 port 35430 connected to 192.168.1.103 port 5201
[ 9] local 192.168.40.2 port 35446 connected to 192.168.1.103 port 5201
[ 11] local 192.168.40.2 port 35454 connected to 192.168.1.103 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 73.7 MBytes 0.62 Gbits/sec 0 521 KBytes (omitted)
[ 7] 0.00-1.00 sec 74.0 MBytes 0.62 Gbits/sec 0 524 KBytes (omitted)
[ 9] 0.00-1.00 sec 65.4 MBytes 0.55 Gbits/sec 0 450 KBytes (omitted)
[ 11] 0.00-1.00 sec 64.2 MBytes 0.54 Gbits/sec 0 468 KBytes (omitted)
[SUM] 0.00-1.00 sec 277 MBytes 2.33 Gbits/sec 0 (omitted)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 0.00-1.00 sec 68.4 MBytes 0.57 Gbits/sec 0 546 KBytes
[ 7] 0.00-1.00 sec 68.2 MBytes 0.57 Gbits/sec 0 549 KBytes
[ 9] 0.00-1.00 sec 65.7 MBytes 0.55 Gbits/sec 0 450 KBytes
[ 11] 0.00-1.00 sec 66.0 MBytes 0.55 Gbits/sec 0 468 KBytes
[SUM] 0.00-1.00 sec 268 MBytes 2.25 Gbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.00 sec 68.5 MBytes 0.57 Gbits/sec 0 546 KBytes
[ 7] 1.00-2.00 sec 68.6 MBytes 0.58 Gbits/sec 0 549 KBytes
[ 9] 1.00-2.00 sec 65.7 MBytes 0.55 Gbits/sec 0 450 KBytes
[ 11] 1.00-2.00 sec 65.9 MBytes 0.55 Gbits/sec 0 468 KBytes
[SUM] 1.00-2.00 sec 269 MBytes 2.25 Gbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 67.3 MBytes 0.56 Gbits/sec 0 546 KBytes
[ 7] 2.00-3.00 sec 69.1 MBytes 0.58 Gbits/sec 0 573 KBytes
[ 9] 2.00-3.00 sec 64.8 MBytes 0.54 Gbits/sec 0 450 KBytes
[ 11] 2.00-3.00 sec 65.9 MBytes 0.55 Gbits/sec 0 468 KBytes
[SUM] 2.00-3.00 sec 267 MBytes 2.24 Gbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-4.00 sec 68.4 MBytes 0.57 Gbits/sec 0 546 KBytes
[ 7] 3.00-4.00 sec 70.2 MBytes 0.59 Gbits/sec 0 573 KBytes
[ 9] 3.00-4.00 sec 65.8 MBytes 0.55 Gbits/sec 0 470 KBytes
[ 11] 3.00-4.00 sec 65.9 MBytes 0.55 Gbits/sec 0 468 KBytes
[SUM] 3.00-4.00 sec 270 MBytes 2.27 Gbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 68.6 MBytes 0.58 Gbits/sec 0 546 KBytes
[ 7] 4.00-5.00 sec 69.1 MBytes 0.58 Gbits/sec 0 573 KBytes
[ 9] 4.00-5.00 sec 66.3 MBytes 0.56 Gbits/sec 0 470 KBytes
[ 11] 4.00-5.00 sec 65.0 MBytes 0.55 Gbits/sec 0 468 KBytes
[SUM] 4.00-5.00 sec 269 MBytes 2.26 Gbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 341 MBytes 0.57 Gbits/sec 0 sender
[ 5] 0.00-5.00 sec 342 MBytes 0.57 Gbits/sec receiver
[ 7] 0.00-5.00 sec 345 MBytes 0.58 Gbits/sec 0 sender
[ 7] 0.00-5.00 sec 345 MBytes 0.58 Gbits/sec receiver
[ 9] 0.00-5.00 sec 328 MBytes 0.55 Gbits/sec 0 sender
[ 9] 0.00-5.00 sec 328 MBytes 0.55 Gbits/sec receiver
[ 11] 0.00-5.00 sec 329 MBytes 0.55 Gbits/sec 0 sender
[ 11] 0.00-5.00 sec 329 MBytes 0.55 Gbits/sec receiver
[SUM] 0.00-5.00 sec 1.31 GBytes 2.25 Gbits/sec 0 sender
[SUM] 0.00-5.00 sec 1.31 GBytes 2.25 Gbits/sec receiver
and reverse
iperf3 --client 192.168.1.103 --omit 1 --time 5 --parallel 4 -f g -R
Connecting to host 192.168.1.103, port 5201
Reverse mode, remote host 192.168.1.103 is sending
[ 5] local 192.168.40.2 port 35472 connected to 192.168.1.103 port 5201
[ 7] local 192.168.40.2 port 35480 connected to 192.168.1.103 port 5201
[ 9] local 192.168.40.2 port 42704 connected to 192.168.1.103 port 5201
[ 11] local 192.168.40.2 port 42706 connected to 192.168.1.103 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 24.6 MBytes 0.21 Gbits/sec (omitted)
[ 7] 0.00-1.00 sec 66.9 MBytes 0.56 Gbits/sec (omitted)
[ 9] 0.00-1.00 sec 130 MBytes 1.09 Gbits/sec (omitted)
[ 11] 0.00-1.00 sec 45.6 MBytes 0.38 Gbits/sec (omitted)
[SUM] 0.00-1.00 sec 267 MBytes 2.24 Gbits/sec (omitted)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 0.00-1.00 sec 29.2 MBytes 0.24 Gbits/sec
[ 7] 0.00-1.00 sec 63.2 MBytes 0.53 Gbits/sec
[ 9] 0.00-1.00 sec 131 MBytes 1.10 Gbits/sec
[ 11] 0.00-1.00 sec 45.2 MBytes 0.38 Gbits/sec
[SUM] 0.00-1.00 sec 269 MBytes 2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.00 sec 31.9 MBytes 0.27 Gbits/sec
[ 7] 1.00-2.00 sec 61.6 MBytes 0.52 Gbits/sec
[ 9] 1.00-2.00 sec 129 MBytes 1.08 Gbits/sec
[ 11] 1.00-2.00 sec 46.5 MBytes 0.39 Gbits/sec
[SUM] 1.00-2.00 sec 269 MBytes 2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 35.9 MBytes 0.30 Gbits/sec
[ 7] 2.00-3.00 sec 57.8 MBytes 0.49 Gbits/sec
[ 9] 2.00-3.00 sec 130 MBytes 1.09 Gbits/sec
[ 11] 2.00-3.00 sec 44.6 MBytes 0.37 Gbits/sec
[SUM] 2.00-3.00 sec 269 MBytes 2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-4.00 sec 41.1 MBytes 0.34 Gbits/sec
[ 7] 3.00-4.00 sec 51.4 MBytes 0.43 Gbits/sec
[ 9] 3.00-4.00 sec 136 MBytes 1.14 Gbits/sec
[ 11] 3.00-4.00 sec 39.7 MBytes 0.33 Gbits/sec
[SUM] 3.00-4.00 sec 268 MBytes 2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 44.2 MBytes 0.37 Gbits/sec
[ 7] 4.00-5.00 sec 54.4 MBytes 0.46 Gbits/sec
[ 9] 4.00-5.00 sec 134 MBytes 1.12 Gbits/sec
[ 11] 4.00-5.00 sec 36.4 MBytes 0.31 Gbits/sec
[SUM] 4.00-5.00 sec 269 MBytes 2.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 183 MBytes 0.31 Gbits/sec 0 sender
[ 5] 0.00-5.00 sec 182 MBytes 0.31 Gbits/sec receiver
[ 7] 0.00-5.00 sec 289 MBytes 0.48 Gbits/sec 1 sender
[ 7] 0.00-5.00 sec 288 MBytes 0.48 Gbits/sec receiver
[ 9] 0.00-5.00 sec 660 MBytes 1.11 Gbits/sec 0 sender
[ 9] 0.00-5.00 sec 660 MBytes 1.11 Gbits/sec receiver
[ 11] 0.00-5.00 sec 213 MBytes 0.36 Gbits/sec 3 sender
[ 11] 0.00-5.00 sec 212 MBytes 0.36 Gbits/sec receiver
[SUM] 0.00-5.00 sec 1.31 GBytes 2.26 Gbits/sec 4 sender
[SUM] 0.00-5.00 sec 1.31 GBytes 2.25 Gbits/sec receiver
CPU usage is ~ 15% per core, except the main WG core which is 40%
https://imgur.com/a/SDtQK5I
looking at the kernels for pfSense 2.7.2 CE and OPNsense 24.10.2, the wireguard implementation is nearly identical. so it seems like the OPNsense issue is somewhere else in the kernel.
since the behavior is the same for multiple NICs and drivers, igb, igc, ix, ixl, mce, it probably not a driver issue.
which leaves some sort of pf or networking issue in the OPNsense kernel.
i guess next steps are maybe to try vanilla FreeBSD and see if it also occurs.
I can reach >= 700 Mbits/s between two Linux hosts connected over Wireguard on an N100 box, so there is no principal problem.
But I need more than one connection with -R and one thread only, so: did you enable RSS?
when i enabled RSS on the Xeon 2278g (8c/8t with hyperthreads disabled)
net.isr.bindthreads = 1
net.isr.maxthreads = -1
net.inet.rss.enabled = 1
net.inet.rss.bits = 3
my throughput went down to ~300 Mbit/sec (but again only in the 1 direction)
these are all fresh installs of 24.7.1, 24.10.2, and 25.1 with no configuration other than intel-cpu-microcode and wireguard.
i just tested
- FreeBSD 14.2 on the supermicro xeon 2278g and running iperf3 server on freebsd and had no problems running in either direction. (so no firewalling)
- OpenWRT x86 and no problem maxing out 2.5g on the N305 system
Disable Spectre Mitigation: "sysctl hw.ibrs_disable=1", should work immediately, without reboot.
i have tried with
vm.pmap.pti=0
hw.ibrs_disable=1
and there is no difference. on OPNsense, the CPUs are idle when traffic is flowing..
but even so, all of these setups have so much more Ghz than 500 Mbit/sec. the Xeon E-2278G is 8c/16t 5ghz with 4.3ghz all core turbo. the i3-14000T is 4c/8t with 4.3 Ghz all turbo. additionally, the other OSes, vanilla FreeBSD 14.2, pfSense 2.7.2 all have these on by default.
i have installed FreeBSD, pfSense, OPNsense 24.7.1, 24.10.2, 25.1 on separate SSDs, so at least i can switch back and forth easy to try and test things, the combination of 3 PCs, 3 CPUs, 8 different NICs and only OPNsense has an issue and only in one direction is odd.
Since it is not CPU-limited and also, several NICs show the same behavior, and also, pfSense does not show that, I can only suspect a one-directional problem with the network, like, e.g. flow control or hardware offloading? IDK if the settings differ between OSes.
+1 yea, i am going to try and do sysctl -a
on all systems and diff those and see if i can find any differences to experiment with as my next steps.
ixl driver defaults to dev.ixl.0.fc=0
and i have all the hardware offloading turned off (OPNsense default), but i have also tried with it all turned on and just checksuming turned on (pfSense default). no change whatsoever. for igb and igc drivers have tried flowcontrol on/off but no change either
I'm no expert and I'm curious as to why your performance is poor in just one direction. With OPNsense 25.1.1 on a DEC850 with 10G ports, I'm getting around 2Gbps both Up/Dn single threaded. I've tried higher -P values, but the results are about the same even though I have applied some RSS tweaks. Here's a screen shot using my Windows PC with WireGuard activated through the firewall to an iperf instance on my NAS server (all 10G path).
(https://i.postimg.cc/xkB78qkx/WIre-Guard-Speed-Test-Result.png) (https://postimg.cc/xkB78qkx)
My configuration is dual stack (IPv4/IPv6) and also I use a lower WireGaurd MTU of 1360 because my phone is cellular CGNAT even though my test result above is from my PC which is not using cellular:
(https://i.postimg.cc/VJL2HxQR/My-Wire-Guard.png) (https://postimg.cc/VJL2HxQR)
(https://i.postimg.cc/7C5dqSLJ/Wire-Guard-Instance.png) (https://postimg.cc/7C5dqSLJ)
(https://i.postimg.cc/MccHrgtQ/Wire-Guard-Outbound.png) (https://postimg.cc/MccHrgtQ)
I also have a firewall normalization entry:
(https://i.postimg.cc/QFcwCcGF/Firewall-Normalization.png) (https://postimg.cc/QFcwCcGF)
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 8 8
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 4096 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
wow thanks for that info. that will help a lot with me being able to verify my setup.
yea, i will be able to get back to testing this weekend. it does seem like its something in my environment, but my environment is 3 directly connected PCs and i just swap out the router SSD with another OS like pfSense or OpenWRT and get full speeds.
so i have not made any progress.
going back and forth with SSDs, pfSense 2.7.2 CE is always hitting max speeds in both directions. i've tried diff'ing sysctl -a
between the two systems and they are not really that different. any changes i made to make OPNsense match pfSense sysctls, made no differences whatsoever.
i shortened all the iperf3 output for display. but i have saved all the data. this is all 100% reproducible. followed the OPNsense official docs to setup Wireguard and OpenVPN.
iperf3 --client <ip> --no-delay --parallel 4 [--reverse]
Supermicro X11SCL-iF with Intel Xeon E-2278G (8c/16t 5Ghz), Intel X710-DA2 SFP+ NIC with v9.53 firmware
pfSense 2.7.2 CE
NAT Port Forward
Connecting to host 192.168.160.10, port 5201
[SUM] 0.00-5.00 sec 5.46 GBytes 9380 Mbits/sec 2010 sender
[SUM] 0.00-5.00 sec 5.46 GBytes 9376 Mbits/sec receiver
Reverse mode, remote host 192.168.160.10 is sending
[SUM] 0.00-5.00 sec 5.48 GBytes 9417 Mbits/sec 0 sender
[SUM] 0.00-5.00 sec 5.48 GBytes 9415 Mbits/sec receiver
Wireguard
Connecting to host 192.168.1.101, port 5201
[SUM] 0.00-4.00 sec 3.55 GBytes 7616 Mbits/sec 364 sender
[SUM] 0.00-4.00 sec 3.55 GBytes 7615 Mbits/sec receiver
Reverse mode, remote host 192.168.1.101 is sending
[SUM] 0.00-4.00 sec 3.12 GBytes 6694 Mbits/sec 200 sender
[SUM] 0.00-4.00 sec 3.12 GBytes 6692 Mbits/sec receiver
OPNsense 25.1.1
Wireguard
Connecting to host 192.168.1.101, port 5201
[SUM] 0.00-4.00 sec 3.76 GBytes 8077 Mbits/sec 554 sender
[SUM] 0.00-4.00 sec 3.76 GBytes 8071 Mbits/sec receiver
Reverse mode, remote host 192.168.1.101 is sending
[SUM] 0.00-4.00 sec 149 MBytes 312 Mbits/sec 272 sender
[SUM] 0.00-4.00 sec 142 MBytes 299 Mbits/sec receiver
Odroid H4 Ultra with Intel n305 (8c), Intel i226V 2.5g NIC
OPNsense 25.1.1
Wireguard
Connecting to host 192.168.1.101, port 5201
[SUM] 0.00-4.00 sec 1.04 GBytes 2229 Mbits/sec 303 sender
[SUM] 0.00-4.00 sec 1.04 GBytes 2228 Mbits/sec receiver
Reverse mode, remote host 192.168.1.101 is sending
[SUM] 0.00-4.00 sec 248 MBytes 519 Mbits/sec 1157 sender
[SUM] 0.00-4.00 sec 241 MBytes 506 Mbits/sec receiver
at a loss for what to try next, i setup OpenVPN with DCO on the Odroid H4 Ultra.
OpenVPN (DCO)
Connecting to host 192.168.1.101, port 5201
[SUM] 0.00-5.00 sec 1.32 GBytes 2266 Mbits/sec 0 sender
[SUM] 0.00-5.04 sec 1.33 GBytes 2268 Mbits/sec receiver
Reverse mode, remote host 192.168.1.101 is sending
[SUM] 0.00-5.00 sec 329 MBytes 551 Mbits/sec 0 sender
[SUM] 0.00-5.00 sec 327 MBytes 548 Mbits/sec receiver
So NAT port forwarding tests so line rate, 10g or 2.5g from both pfSense and OPNsense.
pfSense CE with wireguard shows 7.6 gbit/sec and 6.6 gbit/sec.
OPNsense with wireguard shows 8.0 gbit/sec (2.2 gbit/sec on i226v) and 300-500 mbit/sec (messing with MTU/MSS/normalization rules only reduces throughput)
OPNsense with openvpn (i226v) shows 2.2 gbit/sec and 548 mbit/sec
using top, when systems are doing 500 mbit/sec the cpus are idle. the power draw from the outlet is even at the idle power watts. there must be some sort of kernel lock in OPNsense??
VPN provider doesn't matter. Wireguard or OpenVPN shows the same issues. reverse direction is locked to 500 mbit/sec. these are vanilla / new installs. same systems. i am just literally swapping the SSDs and rebooting between pfSense and OPNsense and getting 100% reproducible results.
i don't know what to do next. i have tried multiple systems (for clients and servers), but seemingly i the only person seeing this?
1 reddit post seeing similar behavior: https://www.reddit.com/r/opnsense/comments/1gwzkye/opnsense_and_wireguard_why_is_wireguard_limited/ but user give up and bought pfSense plus.
Using the router that gives you asymmetrical results, I would physically swap the client and server.
You have the choice of initiating the test from either end.
OPNsense actually seems to have better results that pFsense in one direction...
Your results indicate a bad interaction between one machine and the side of the router it is connected to.
I'd be looking for low level statistics on retries.
it just appears this is too much information for ppl to read all the text and see that i have tried multiple combinations of hardware, software, nic, cables, etc.... that is a bit frustrating, but understandable, i guess.
i have been swapping multi PCs for both client, server, and router.
AMD 7950X3D
Intel Xeon E-2278G (X11SCL-iF)
Intel Xeon E-2414 (X13SCH-LN4F)
Intel Pentium G7400 (X13SCL-iF)
Odroid H4 Ultra Intel n-305
Lenovo P3 Tiny Intel i3-14100t
Intel i7-13700t (X13SAE-F)
with the exception of the 7950X3D, i have just swapped the SSDs around between client (Ubuntu 22.04), router (OPNsense 24.10.2 and 25.1.1, pfSense 2.7.2 CE), server (FreeBSD 14.2, Ubuntu 22.04)
i have also tried the following NICs in all combinations of PCs and NICs:
Intel E810-XVVA2 (25g)
Mellanox Connect5 (25g)
Mellanox Connect3 (10g)
Intel X710-DA2 fw 9.53
Intel X710-DA2 fw 8.10
Intel X520-DA2
Intel i225V-B3
Intel i226V
Intel i350-T4
Intel i210 (onboard NICs for supermicro motherboards)
and i have used 50 different patch cables, SPF+ transceivers, UniFi DAC cables, etc.
the only common issue is OPNsense that i have discovered so far. running through all those combinations took days and days. both pfSense and OPNsense act the same way on every combination. pfSense is line rate or CPU bound. OPNsense is line rate or CPU bound in one direction and then kernel bound to ~500 mbit/sec in one direction when routing through Wireguard or OpenVPN. OPNsense can NAT port forward at 10g easily.
No matter the NIC speed or driver in OPNsense, routing through OpenVPN or Wireguard results in ~ 500 mbit/sec throughput in one direction
I saw evidence of swapping the router and HW on the router, less so of swapping client & server or where the test is initiated from.
I question your testing methodology. Going wide might lead to a combination that works, not necessarily for a root cause of the mismatch.
OPN is a common factor but it would be easier to blame if that was the case in both directions.
if i disable the firewall
pfctl -d
but still use the wireguard interface, i can see the wireguard kernel threads using CPU and i get symmetrical speeds with intel i350-t4, i226v, x710-da2. i didn't test anymore, since the before/after was 100% reproducible.
there is 100% some bug in the OPNsense firewall / pf side of things or some configuration that comes with a vanilla install that is causing this.
i am blaming OPN because the exact setups all work when i try pfSense 2.7.2 CE, OpenWRT x86_64 24.10.0. the same client/servers, hardware, just replace the router software.
You might be getting somewhere.
So FW + Wireguard + iperf3 reverse?
And the steps are:
Client -> OPN-WAN / OPN-LAN -> Server
Wireguard connection to OPN server initiated from Client
Then from client: iperf3 --client <server ip> --no-delay --parallel 8 --reverse
Rule on the Wireguard interface?
I'm actually a little curious about what the actual traffic looks like so I'll set something up after I get confirmation of the entire test environment.
I have no idea what FW state is going to be created as a result of such experiment so this is a learning opportunity.
my basic setup looks like this:
(https://i.imgur.com/kZBRBg7.png)
3 computers are
- completely isolated, directly connected
- in my 10g setup, SFP+ OM3 fiber or UniFi DAC cables, results are the same
- fresh vanilla installs for ubuntu 24.04, pfSense 2.7.2 CE, OPNsense 24.10.2, 25.1.1
- router software is changed by just swapping the SSD, all other hardware stays exactly the same
wireguard is setups using the official pfSense documentation (https://docs.netgate.com/pfsense/en/latest/vpn/wireguard/index.html) and OPNsense road warrior documentation (https://docs.opnsense.org/manual/how-tos/wireguard-client.html).
iperf3 commands on the client:
iperf3 --client 192.168.1.100 --no-delay --omit 1 --time 5 --parallel 4 --format m
iperf3 --client 192.168.1.100 --no-delay --omit 1 --time 5 --parallel 4 --format m --reverse
to verify the setup, i setup a NAT port forward on port 5201 to 192.168.1.100 and run iperf3 against the WAN IP from client
iperf3 --client 192.168.160.10 --no-delay --omit 1 --time 5 --parallel 4 --format m
iperf3 --client 192.168.160.10 --no-delay --omit 1 --time 5 --parallel 4 --format m --reverse
pfSense, OPN 24.10.2, OPN 25.1.1 all showed ~9.45 Gbit/sec in both directions.
wireguard results- pfSense
- upload: 7.6 Gbit/sec
- download: 6.6 Git/sec
- OPN 24.10.2
- upload: 8.0 Gbit/sec
- download: 543 Mbit/sec
- OPN 25.1.1
- upload: 8.0 Gbit/sec
- download: 538 Mbit/sec
if i disable pf via
pfctl -d
and re-run the iperf3 commands but still going through the wireguard interfaces
- OPN 25.1.1
- upload: 8.1 Gbit/sec
- download: 7.5 Gbit/sec
For this test the CPUs are nearly 100% in use with the kernel threads and wireguard threads. This is basically the CPU bound max of the 2278g setup.
pfSense and OPN are setup with a wireguard interface and have a single firewall rule: allow from wg0 net to any.
i have tried various MTUs, outbound NAT rules, and firewall normalization rules suggested in the documentation or on these forums. those extra rules made no difference whatsoever to the results. the reverse iperf3 direction is always stuck ~ 500 Mbits and the CPUs are nearly 100% idle during transmission.
no matter which CPU/nic i used in the router, e-2278g, e-2414, G7400, n305, i3-14000t, i5-13400t, --reverse direction is always ~ 500 Mbit/sec. some kernel level delay blocking CPU from working as hard as it can.
the installs and setups are as out-of-box as possible. pfSense i had to install the wireguard package. OPNsense i install the cpu-microcode-intel package.
i have gone through various tunables, but none make any real impact. 8 Gbps vs 500 Mbps isn't going to be tweaked, unless its some sort of bug that can be worked around
i have also tried many other machines as noted earlier for client, router, server. i also put in my AMD 7950X3D windows 11 desktop into the mix as client and server. there is no difference in behavior whatsoever.
i have setup a OpenVPN tunnel via the OPNsense OpenVPN road warrior documentation and i get the same behavior. so its not wireguard. i did not try an IPsec tunnel.
i also diff'd the output of
sysctl -a
of pfSense 2.7.2 and OPNsense 25.1.1 and saw no real meaningful differences.
Hmm, I didn't get a chance to try this out yesterday and I'm not going to be able to for another couple of days.
This said, something occurred to me overnight.
You mention a static IP on the WAN side and a directly attached machine. So no gateway? Or gateway set to the attached machine?
I ask because I've been bit by a fairly obscure setting (Firewall > Settings > Advanced > Disable reply-to) in the past that affects how WAN traffic is routed.
By default, all out traffic on the WAN is directed at the gateway for that network, which is back holed if the gateway is another firewall.
Maybe it's irrelevant but I wonder how that "feature" interacts with your test environment.
You might be better off adding a switch, having an actual gateway and disabling reply-to.
yea i have run with an actual gateway as well, adding the WAN to my normal network, i do this initially so that i can install any needed packages and say update 25.1 to 25.1.1.
in either case, there is no difference in behavior. i tried eliminating the external network after a few days to try and isolate it more, but the results are exactly the same either way, unfortunately
Poor speed or disconnects, usually MTU is wrong, set too high at either side of the tunnel
Well, if you ignore the official instructions (https://docs.opnsense.org/manual/how-tos/wireguard-s2s.html) and do not set MSS clamping for Wireguard, then, yes, of course...
yea i followed the documentation, verified mtu, with or without the firewall normalization rule, tried messing with MTUs on both sides. there is no difference.
i also verified the out-of-box MTUs are the same as pfSense and OpenWRT according to ifconfig
Quote from: dirtyfreebooter on February 17, 2025, 07:44:20 PMyea i have run with an actual gateway as well, adding the WAN to my normal network, i do this initially so that i can install any needed packages and say update 25.1 to 25.1.1.
in either case, there is no difference in behavior. i tried eliminating the external network after a few days to try and isolate it more, but the results are exactly the same either way, unfortunately
And reply-to was disabled on the OPN being tested, right?
Otherwise, all reply traffic between OPN and the desktop on the WAN side bounces via the WAN gateway.
I managed to run enough of a test to look at states and traffic (for my education).
It's actually pretty darn simple. Simple UDP tunnel between the wireguard client and OPN WAN + one TCP connection per iperf thread from client's wireguard IP and target machine.
It's no surprise loads are negligible when traffic is choked up. I have no idea where the bottleneck could be.
I don't know that a packet capture would reveal anything.
FWIW, my test environment was way worse than yours (yet sufficient for my investigation):
Ubuntu VM on my prod N305 based proxmox (where my OPN also lives), in a separate VLAN so I don't need to deal with reply-to -> main OPN for inter-VLAN -> managed switch -> OPN with Wireguard (also virtualized on N100 hardware) -> unmanaged switch -> target Ubuntu.
I still managed to get 900Mbps in both directions, very symmetrical... It's mindboggling you can't exceed 550Mbps on your hardware.
O-M-G ** SOLVED ** THANK YOU eric!!
it was the reply-to... changed it to
(https://i.imgur.com/LkEHFk0.png)
and immediately all OPN installs worked in both directions... from 23.1 to 25.1.1 on all my hardware setups...
The Intel X E-2278G with X710-DA2
Up: 8.60 Gbits/sec
Down: 7.27 Gbits/sec
iperf3 --client 192.168.1.20 --omit 1 --time 5 --parallel 16 --format g
...
[SUM] 0.00-5.00 sec 5.00 GBytes 8.59 Gbits/sec 3327 sender
[SUM] 0.00-5.00 sec 5.00 GBytes 8.60 Gbits/sec receiver
iperf3 --client 192.168.1.20 --omit 1 --time 5 --parallel 16 --format g --reverse
Reverse mode, remote host 192.168.1.20 is sending
...
[SUM] 0.00-5.00 sec 4.23 GBytes 7.27 Gbits/sec 7211 sender
[SUM] 0.00-5.00 sec 4.23 GBytes 7.27 Gbits/sec receiver
i turned off the firewall normalization rule the Road Warrior Docs say to use and now i consistently get
Up:[SUM] 0.00-5.00 sec 5.23 GBytes 8.98 Gbits/sec 34 sender
[SUM] 0.00-5.00 sec 5.23 GBytes 8.98 Gbits/sec receiver
Down:[SUM] 0.00-5.00 sec 4.62 GBytes 7.93 Gbits/sec 5610 sender
[SUM] 0.00-5.00 sec 4.62 GBytes 7.93 Gbits/sec receiver
Nice!
I kinda liked that theory because it explained the discrepancy, but it was arguably just a theory until you verified it.
I don't know what the gateway was but it was likely not multi-gig.
The downgrade all the way to ~550 might have been caused by collisions.
As to what was going on with the client directly connected is still unknown.
But that's such an atypical use case that it's not worth investigating further.
Sometimes simplifying the test bench to an extreme has unpredictable side effects.
When I got bit by this setting, no traffic went through because my main OPN rejected it (state violation) since it was reply traffic to requests that it never saw. That's usually easier to troubleshoot than performance issues... Especially since I had FW logs.
This was an interesting thread, but I have a question about the "reply-to" setting.
I have only one physical WAN interface, but I use one WG0 for incoming Wireguard clients and I have 2 other interfaces that handle outgoing VPN (OpenVPN and Wireguard) connections via separate gateways.
I guess this means I should also disable "reply-to" on WAN rules, correct?
I'm not entirely sure how you use your VPNs but reply-to is primarily meant to handle multi-WAN use cases.
Per the documentation:
QuoteWith Multi-WAN you generally want to ensure traffic leaves the same interface it arrives on, hence reply-to is added automatically by default.
Thanks for the reply. I've setup something like this https://docs.opnsense.org/manual/how-tos/wireguard-selective-routing.html for Wireguard and something similar for OpenVPN. So certain devices on my network are routed through/via either Wireguard or OpenVPN.
I've read the documentation you quoted, but what I wanted to know was whether this only applies to fail-over multi-wan with multiple physical ports, or also applies to my setup. Because even though I only have one physical WAN port, I have different WAN interfaces (WAN, WAN_WireGuard, WAN_OpenVPN).
I highly suspect that my setup counts as multi-wan, but I wanted to confirm with the experts.
I'm no expert in anything networking related and have 0 experience with multi-WAN or multi-VPN, very limited VPN experience altogether.
I'm more familiar with the side effects of reply-to in the context of OPN deployed on an internal network, which is what this thread was about.
I'd leave it on and suggest you create a new thread if you encounter issues.