OPNsense Forum

English Forums => Hardware and Performance => Topic started by: pfop on February 19, 2024, 05:04:59 PM

Title: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on February 19, 2024, 05:04:59 PM
Hello colleagues

Intro

I'm a long time (over 15 years) pfSense user, now moving to OPNsense once my new fiber connection is ready, as OPNsense offers better NAT performance in my tests.
So far I used pfSense on ALIX and APU devices from PC Engines, as also virtually on VMs.

New hardware
For my fiber connection, which will be 10GBit symmetrical, I got a passive Quotom device, which is powered by an 8 Core Intel Atom C3758R CPU, 32GB DDR4 2400MHz ECC RAM (2x 16GB) and two NVME SSDs with ZFS Mirror.
The devices provide 4x SFP+ X553 ports, 5x RJ45 2.5G Intel I225-V.

Issue with Wireguard performance
What currently is bugging me, is the Wireguard performance on OPNsense, compared to pfSense.
On the C3758R I get with pfSense 2.7.2 and the 'WireGuard' version 0.2.1 package 1300Mbit of Wireguard performance.
On the C3758R I get with OPNsense 24.1.1 630Mbit of Wireguard performance.

Setup
The setup for both tests is exactly the same, also the same physical box was used for all tests.

ServerA is wired directly to SFP+ port1 (ix1) on OPNsense with a 10G LR SM optic.
ServerB is wired directly to SFP+ port2 (ix2) on OPNsense with a 10G LR SM optic.

(https://i.ibb.co/q1cNcQf/opnsense-Testing-drawio-1.png)

ix1 = OPNsense LAN, MTU 1500
ix2 = OPNsense WAN, outbound NAT active, MTU 1500

Testing
Doing iperf3 tests between ServerA and ServerB, I can reach with 1 stream up to 3.5GBit, with more streams, I can saturate the 10Gbit interfaces.

When estabilishing a Wireguard VPN between FW01 and ServerB, iperf3 tests between ServerA to ServerB's WG IP, I can reach with 1 stream about 630MBit and the CPU utilization is at 100%.

pfSense Wireguard performance
Doing the exactly same with pfSense, with the same physical Firewall, I can reach 1300MBit through Wireguard with the exact same setup.

Question
Has anyone an idea, why OPNsense is 50% slower in regards to Wireguard throughput? Is there any hidden options that can be modified, to get closer to the 1300MBit possible on pfSense?

I look forward to an constructive discussion!

Best regards
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on February 19, 2024, 05:42:03 PM
You have read the documentation section on performance (https://docs.opnsense.org/troubleshooting/hardening.html)?

A quick google search also returns this (https://www.reddit.com/r/OPNsenseFirewall/comments/m80s91/opnsense_slower_than_pfsense/), so have you disabled the Spectre and Meltdown mitigations?

Also, some mitigations have been obsoleted by microcode updates (https://forum.opnsense.org/index.php?topic=36139.0), did you apply them?

If that does not help: The wireguard performance on FreeBSD is not particularly good, so maybe the pfSense folks have come up with something special. It does not use AES, so that AES-NI instructions do not help, either.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on February 19, 2024, 06:21:43 PM
Hello meyergru

Thank you for taking time to reply!
I did now disable spectre/meltdown settings and also the IPv4 random IDs, applied them, and rebooted the firewall.
net.inet.ip.random_id   0
hw.ibrs_disable 1
vm.pmap.pti 0

Redoing the test shows maybe a slight increase to 650MBit, but nowhere close to the 1300MBit from pfSense.
Microcode Update is installed - yes.

It's correct, that Wireguard is not using AES, there are some Intel Quick Assist implementations which can help, but this system has too old Quick Assist afaik, anyway it is the same for pfSense.

BR
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: vpx on February 20, 2024, 04:26:02 PM
Is it possible that OPNsense uses the WireGuard Go implementation while pfSense is using the WireGuard Kernel implementation?

https://github.com/opnsense/ports/tree/master/net/wireguard-go

See also: https://www.netgate.com/blog/wireguard-in-pfsense-2-5-performance
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: franco on February 20, 2024, 05:27:48 PM
Can we cut to the chase and admit both are using the FreeBSD base kernel module now?


Cheers,
Franco
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on February 20, 2024, 09:52:45 PM
Hello together

Since opnSense 24.1, it uses kernel based WG implementation, you can also see this in their release notes:
wireguard: installed by default using the bundled FreeBSD 13.2 kernel module
Source: https://forum.opnsense.org/index.php?topic=38427.0

Mine is a fresh 24.1.1 install, no upgrade, where there could be some 'leftovers' from wireguard-go.
Current opnSense 24.1.1 is running FreeBSD 13.2-RELEASE-p9:

root@OPNsense:~ # uname -r
13.2-RELEASE-p9

pfSense using WG kernel module:
[2.7.2-RELEASE][root@pfSense]/root: kldstat | grep wg
9    1 0xffffffff83e4f000    2e560 if_wg.ko

OPNsense using WG kernel module:
root@OPNsense:~ # kldstat | grep wg
22    1 0xffffffff82df2000    2f560 if_wg.ko

BR
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: franco on February 20, 2024, 09:57:22 PM
There can be no leftovers. Wireguard will even ignore the Go implementation if it has a kernel module readily available. But all that is moot because we only have code for kernel setup in 24.1 anyway. ;)

It's probably a rule for scrubbing or something else being configured suboptimal vs. pfSense.

We don't really hear anyone saying "it's much faster on OPNsense" which likely means it's the same speed on both in the average case. And there is no reason it shouldn't.


Cheers,
Franco
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: 36thchamber on February 20, 2024, 10:03:40 PM
I was trying all possible settings that were published, incl. rss, ibrs, for months and all had zero impact. IDS/IPS (and RSS) would slow it down but i'm not using it.
Then i installed 24.1.x and WG throughput ^doubled^. On all devices, all counter OS, iperf or web speedtests, different servers, different VPNs, different interfaces, ISP base speed monitored nonstop.. all went up to 2gbit while cpu usage halved. Upload is generally slower, so it was at full speed, but there's no more a little gap, it's now pure 100.0% of ISP speed.
So i reread the newsletter for 24. I thought wg package removal is about gui only, and the 13.2 kernel wg was there before..:o
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: lewald on February 21, 2024, 10:45:33 AM
You can play with Normalization.

http://x.x.x.x/firewall_scrub.php

Add rule -
On Interface Wireguard Group
max MSS. 1300

This helps me to get max Performance with Wireguard. I do it on both sides.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: maclinuxfree on February 21, 2024, 12:27:08 PM
mss 1300 helped me, too.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: mimugmail on February 21, 2024, 12:37:33 PM
Or in instance tick advanced and set MTU to the same value on all devices.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on February 21, 2024, 01:07:15 PM
This is getting interesting...

I am unable to compare against pfSense nor do I want to start a war of what is faster. I take pfops finding only as an indication of poor wireguard performance comparing of what could be expected.

From several discussions about wireguard speed in the past, I got the impression that the implementation suffers.

This may be of the time where it was implemented in Go, however, I have an Intel Atom Silver N6005 and now with the kernel implementation I get ~500 MBit/s Wireguard speed. It tops out there with 100% CPU load, regardless if I:

- use more threads (-P4 or -P8 are equal, only -P1 is a little slower).

- disable scrubbing (either on the wireguard interface or on all of them - it makes no difference at all.

- set MSS 1300 or MTU on all WG interfaces.

I have disabled Spectre/Meltdown mitigations and traffic shaping.

My CPU should be more or less comparable to the C3758R, even somewhat fast in single-thread application.
Because more threads in iperf do no influence the result (much), one can infer that the kernel implementation either always uses all available threads for cryptography or is inherently single-threaded. But when I look at "top" with threads and system processes enabled, I can see the kernel at ~300% (the rest is interrupts and user processes), so all 4 threads seem to get utilized.

Paraphrasing what was said in one of the older threads: Wireguard with chacha20-poly1305 was supposed to be much faster than IPSEC and/or OpenVPN with AES, especially on slow CPUs. Considering that, I am disappointed by the results. For starters: I can download a file from a HTTPS site with curl on my OpnSense box at 1 GBit/s. Depending on what the website offers, this is also AES or chacha20-poly1305, but much faster than Wireguard on the same system...

Thus, it would be really interesting to find the bottleneck. I am at a loss on where to look and alas, I lack the time to check if pfSense is really faster or investigate what the difference is.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on February 21, 2024, 06:56:20 PM
Quote from: franco on February 20, 2024, 09:57:22 PM
It's probably a rule for scrubbing or something else being configured suboptimal vs. pfSense.
So I disabled scrubbing on all interfaces, applied the config, rebooted opnSense and retestet. Still same result.

Quote from: lewald on February 21, 2024, 10:45:33 AM
You can play with Normalization.

http://x.x.x.x/firewall_scrub.php

Add rule -
On Interface Wireguard Group
max MSS. 1300
Added the rule, applied it, rebooted opnSense and retested. Still same result.

Quote from: mimugmail on February 21, 2024, 12:37:33 PM
Or in instance tick advanced and set MTU to the same value on all devices.
As I don't know where to look for the issue really, I also tried this change (which in general doesn't make sense, as smaller packets will load the CPU more than bigger ones, and I'm testing on a 1500MTU only network).
So before MTU of the WG interfaces on both side were default, 1420:
opnSense: wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
ServerB: 5: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000

After setting MTU 1300 on both sides:
opnSense: wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1300
ServerB: 6: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1300 qdisc noqueue state UNKNOWN group default qlen 1000

And as expected, the speed went down by about 10MBit because of more overhead with smaller packets.


Quote from: meyergru on February 21, 2024, 01:07:15 PM
- use more threads (-P4 or -P8 are equal, only -P1 is a little slower).
I agree on this, on both opnSense and pfSense, one thread already get close to the maximum throughput, with more streams you only gain very little additional speed.

Quote from: meyergru on February 21, 2024, 01:07:15 PM
My CPU should be more or less comparable to the C3758R, even somewhat fast in single-thread application.
....
But when I look at "top" with threads and system processes enabled, I can see the kernel at ~300% (the rest is interrupts and user processes), so all 4 threads seem to get utilized.
I see 750-780% on the kernel process:
    0 root        142 -16    -     0B  2272K swapin   3  16:30 784.82% kernel
Most likely my system has somewhat better hardware processing, as my interrupts stay very close to 0%, that most likely why you see a bit less performance than my setup, even if you got a some 5% faster CPU than me.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on March 10, 2024, 06:10:32 PM
It has been some weeks and I got some new hardware and did also some additional tests that might be interesting to the community.

The logical setup stays the same, but now I used a Ryzen 5700G with an Intel E810 Quad Port SFP28 network card.

OPNsense WG performance
OPNsense 24.1.1 Wireguard performance: 1800MBit
--> So the Ryzen 5700G is 3x faster compared to the C3758R, alltought the CPU itself is 4.8x faster

pfSense+ WG performance
pfSense+ 23.09.1 Wireguard performance: 6000MBit
--> Out of interest, I did some tests with pfSense+, which uses hardware acceleration for ChaCha20-Poly1305, and it shows an impressive 6000MBit throughput while the CPU is still 75% idle, impressive

Is the difference in post 1 because different FreeBSD versions are used?
I doubt that. I did some tests with Wireguard on FreeBSD VMs with identical configuration, and the results are really close to each other. I used for the FreeBSD tests an old i7-7700K, each FreeBSD VM got 1vCPU assigned.

FreeBSD 13.2: 990MBit
FreeBSD 13.3: 1020MBit
FreeBSD 14.0: 980MBit

Conclusion
One core of the i7-7700K has about 10% of the processing power of a Ryzen 5700G, still it achieves 50% of the throughput of the Ryzen 5700G with OPNsense.
So for me it is clearly an issue related to OPNsense, and not FreeBSD / Kernel version in general.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: Monviech (Cedrik) on March 10, 2024, 07:22:35 PM
Tested it myself on KVM VMs OPNsenses and I'm stuck at around 850 Mbit/s on a Ryzen 9 3900x.

So, this feature does all the magic in pfsense Plus? Intel CPU and IPsec Multi-Buffer (IPsec-MB, IIMB) Cryptographic Acceleration for ChaCha20-Poly1305 ?

https://docs.netgate.com/pfsense/en/latest/hardware/cryptographic-accelerators.html
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on March 10, 2024, 07:35:07 PM
Quote from: Monviech on March 10, 2024, 07:22:35 PM
So, this feature does all the magic in pfsense Plus? Intel CPU and IPsec Multi-Buffer (IPsec-MB, IIMB) Cryptographic Acceleration for ChaCha20-Poly1305 ?

https://docs.netgate.com/pfsense/en/latest/hardware/cryptographic-accelerators.html

To be honest, I tried to find this out, but I'm not a 'low level' expert unfortunately.
What I can say is, installing intel-ipsec-mb-1.5_1 and loading cryptodev.ko didn't make a difference on OPNsense.

Here the loaded modules on pfSense+ and OPNsense.
The iimb.ko on pfSense+ looks like the one we're missing...

pfSense+
[23.09.1-RELEASE][root@pfSense.home.arpa]/root: kldstat
Id Refs Address                Size Name
1   38 0xffffffff80200000  339f830 kernel
2    1 0xffffffff835a0000    abd98 ice_ddp.ko
3    1 0xffffffff8364c000     76f8 cryptodev.ko
4    1 0xffffffff83655000    1e2b0 opensolaris.ko
5    1 0xffffffff83674000   5d7790 zfs.ko
6    1 0xffffffff84710000     2220 cpuctl.ko
7    1 0xffffffff84713000     3210 intpm.ko
8    1 0xffffffff84717000     2178 smbus.ko
9    1 0xffffffff8471a000     9288 aesni.ko
10    1 0xffffffff84800000   666a08 iimb.ko
12    1 0xffffffff84753000     3158 amdtemp.ko
13    1 0xffffffff84757000     2130 amdsmn.ko
14    1 0xffffffff84724000    2e560 if_wg.ko


OPNsense
root@OPNsense:~ # kldstat
Id Refs Address                Size Name
1   86 0xffffffff80200000  216c2e0 kernel
2    1 0xffffffff8236d000     ab48 opensolaris.ko
3    1 0xffffffff82378000     4b58 if_enc.ko
4    3 0xffffffff8237d000    78aa0 pf.ko
5    1 0xffffffff823f6000     a458 cryptodev.ko
6    1 0xffffffff82401000    abc98 ice_ddp.ko
7    1 0xffffffff824ad000     f4c8 pfsync.ko
8    1 0xffffffff824bd000   59dfe0 zfs.ko
9    1 0xffffffff82a5b000     3b18 pflog.ko
10    1 0xffffffff82a5f000     f858 carp.ko
11    1 0xffffffff82a70000     aa70 if_gre.ko
12    1 0xffffffff82a7b000    16148 if_lagg.ko
13    2 0xffffffff82a92000     3538 if_infiniband.ko
14    1 0xffffffff82a96000     e8f8 if_bridge.ko
15    2 0xffffffff82aa5000     8958 bridgestp.ko
16    1 0xffffffff83010000     3378 acpi_wmi.ko
17    1 0xffffffff83014000     3218 intpm.ko
18    1 0xffffffff83018000     2180 smbus.ko
19    1 0xffffffff8301b000     3340 uhid.ko
20    1 0xffffffff8301f000     3380 usbhid.ko
21    1 0xffffffff83023000     31f8 hidbus.ko
22    1 0xffffffff83027000     3320 wmt.ko
23    1 0xffffffff8302b000     72a8 hifn.ko
24    1 0xffffffff83033000     2270 padlock.ko
25    1 0xffffffff83036000    15308 qat.ko
26    1 0xffffffff8304c000     43b0 safe.ko
27    1 0xffffffff83051000     3160 amdtemp.ko
28    1 0xffffffff83055000     2138 amdsmn.ko
29    1 0xffffffff83058000    2f560 if_wg.ko
30    1 0xffffffff83088000     4700 nullfs.ko
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on March 10, 2024, 11:42:51 PM
Of course the presence of the library and/or the kernel module does not make a difference in itself, unless you actually use those functions.

As I already wrote: The FreeBSD implementation is not the best and obviously, Netgate actually has done something special (at least for pfSense plus):

https://redmine.pfsense.org/issues/14291

So, this would best be addressed as a feature request for OpnSense, namely to add IPsec-MB support as an additional crypto acceleration technique. As far as I understand it, the acceleration is not strictly limited to Intel CPUs, but works when certain CPU features are available.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: Monviech (Cedrik) on March 11, 2024, 08:24:37 AM
It seems like there was another thread where tunables have been described:

https://forum.opnsense.org/index.php?topic=37808.0
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on March 11, 2024, 09:05:20 AM
Quote from: Monviech on March 11, 2024, 08:24:37 AM
It seems like there was another thread where tunables have been described:

https://forum.opnsense.org/index.php?topic=37808.0

Thank you for your reply, I added those tunables already without any change in WG performance.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: franco on March 11, 2024, 10:00:56 AM
iimb.ko is pretty interesting because it's not found anywhere in FreeBSD old and new.

Can you try to locate it on the plus install?

# find / -name "iimb.ko"

And then try to see if it belongs to a package or if it is part of the non-free plus sources?

# pkg which /path/to/iimb.ko


Cheers,
Franco
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: Fright on March 11, 2024, 10:44:59 AM
@franco
I bet it's the second
https://www.netgate.com/blog/presentation-of-boosting-ipsec-and-vpn-performance-in-pfsense-software-with-iimb-at-asiabsdcon-2023
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on March 11, 2024, 05:25:17 PM
Quote from: franco on March 11, 2024, 10:00:56 AM
Can you try to locate it on the plus install?
# find / -name "iimb.ko"


[23.09.1-RELEASE][root@pfSense.home.arpa]/root: find / -name "iimb.ko"
/boot/kernel/iimb.ko

Quote from: franco on March 11, 2024, 10:00:56 AM
And then try to see if it belongs to a package or if it is part of the non-free plus sources?
# pkg which /path/to/iimb.ko


[23.09.1-RELEASE][root@pfSense.home.arpa]/root: pkg which /boot/kernel/iimb.ko
/boot/kernel/iimb.ko was installed by package pfSense-kernel-pfSense-23.09.1


To summarize:
pfSense CE on bare metal C3758R, 1300MBit Wireguard throughput
OPNsense on bare metal C3758R, 630MBit Wireguard throughput (-51%)

pfSense+ on bare metal Ryzen 5700G, 6000MBit Wireguard throughput (with IIMB!) at only ~25% CPU load
OPNsense on bare metal Ryzen 5700G, 1800MBit (-70%)

FreeBSD 13.2 on vSphere VM, 1 Core i7-7700K, 990MBit throughput
FreeBSD 13.3 on vSphere VM, 1 Core i7-7700K, 1020MBit throughput
FreeBSD 14.0 on vSphere VM, 1 Core i7-7700K, 980MBit throughput

Unfortunately I was not able to do Wireguard tests on Ryzen 5700G with pfSense CE, as the network driver supplied there only supports one queue which could lead to measurement errors.

pfSense+ is clearly the leader with hardware acceleration of Wireguard.
But I still can't figure out, why OPNsense throughput is so much lower compared to pfSense CE or FreeBSD. There is clearly something unoptimized on OPNsense when using Wireguard.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on March 12, 2024, 12:13:37 AM
The last one is an interesting question. As far as I understand it is that pfSense uses the same crypto as OpnSense. What comes to mind is that there could be some compiler optimization switches, CPU instruction sets or even the compiler itself (gcc vs. clang) that is different.

Actually, what I have found is that in kern.mk, there is a setting for amd64 which effectively disables some instructions:


#
# For AMD64, we explicitly prohibit the use of FPU, SSE and other SIMD
# operations inside the kernel itself.  These operations are exclusively
# reserved for user applications.
#
# gcc:
# Setting -mno-mmx implies -mno-3dnow
# Setting -mno-sse implies -mno-sse2, -mno-sse3, -mno-ssse3 and -mfpmath=387
#
# clang:
# Setting -mno-mmx implies -mno-3dnow and -mno-3dnowa
# Setting -mno-sse implies -mno-sse2, -mno-sse3, -mno-ssse3, -mno-sse41 and -mno-sse42
# (-mfpmath= is not supported)
#
.if ${MACHINE_CPUARCH} == "amd64"
CFLAGS.clang+= -mno-aes -mno-avx
CFLAGS+= -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float \
-fno-asynchronous-unwind-tables
INLINE_LIMIT?= 8000
.endif


The reason given here is clear: The kernel is to run on any amd64-capable platform, regardless of specific features. This partly explains why the chacha20-poly1305 code is kind of slow: Not only is this a piece of code that is not optimized for a specific CPU platform - being part of the kernel, it is compiled for maximum compatibility (and I can only guess: probably without '-O2').

Mind you: I do not know if pfSense CE and FreeBSD really compile this differently and I have no means to check. But this 100% improvement could be fairly easy to unlock.


As for the much faster iimb.ko module: What I found out so far is that there is a cryptography API which enables to use a crypto driver which can implement specific functions - like the Intel QAT engine(s). iimb.ko seems to be such a driver which implements the kernel functions for chacha20-poly1305 and others.

If I got it right, even the FreeBSD wireguard implementation does not use the native wireguard routines, but the kernel crypto functions instead. Thus, what is needed is a crypto driver using the Intel IIMB library. The API is rather arcane, though.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: franco on March 12, 2024, 06:00:54 AM
iimb.ko is a kernel-wrapped version of https://github.com/intel/intel-ipsec-mb as fas as we can tell, which also speeds up WireGuard despite having 'IPsec' in the name.

The measurements here, however, are all over the place and even suggest modification of CE to an unknown degree.

To be frank at this point we can conclude please only compare FreeBSD and OPNsense.

And yet the measurements for FreeBSD and OPNsense given here are all over the place as well so it suggest a low effort out of the box comparison with out any factoring for sysctls and differing kernel version. You could also load a FreeBSD kernel on OPNsense and vice versa. It should give you more consistent testing results to compare.

And if you want to use proprietary software please go ahead but let's stop this advertisement now.  I find it interesting you go trough all of the trouble to touch up product names with markup here. :)


Cheers,
Franco
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on March 17, 2024, 06:58:46 PM
@franco: Would you mind telling how to install a FreeBSD kernel beneath OpnSense? Is it possible to do that after the fact (i.e. install a FreeBSD kernel package)? I know that one could install OpnSense on top of FreeBSD, but it would be easier if one could just replace the kernel.

On a side note: It looks to me as if either Netgate actually has done something with the CE version as well or it is simply because of differences between FreeBSD 14 and 13.2.

Matter-of-fact, I have somewhat verified the "100% faster" claim: In my tests between two otherwise identical OpnSense and pfSense VM instances, they reached speeds of ~1.2 GBit/s in either direction (slow because of virtio networking). Whilst doing that, the OpnSense VM had ~80% load, whereas the pfSense VM only had 40%.

Therefore, I would like to check with a pure FreeBSD 13.2 (and 14) replacement kernel for OpnSense.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: franco on March 18, 2024, 03:48:10 PM
Before doing that I'd rather suggest the easier route and hide iimb.ko from the pfSense system so it cannot be loaded on boot (kldunload may work as well to some degree) in order to see the performance dro.

You could even take that file and move it to a compatible FreeBSD release to kldload it and see what difference it makes. The assumption is this is plug and play crytpo, but be aware this load/unload could crash the kernel in mid-use.

Let me try to compile the steps to load a FreeBSD kernel for OPNsense and get back in a bit.


Cheers,
Franco
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on March 18, 2024, 04:47:01 PM
I only have pfSense CE and there is neither an iimb.ko module loaded nor even present:

[2.7.2-RELEASE][root@pfSense.mgsoft]/boot/kernel: kldstat
Id Refs Address                Size Name
1   32 0xffffffff80200000  339ce08 kernel
2    1 0xffffffff8359d000    1e2b0 opensolaris.ko
3    1 0xffffffff835bc000     76f8 cryptodev.ko
4    1 0xffffffff835c4000   5d7790 zfs.ko
5    1 0xffffffff84418000     2220 cpuctl.ko
6    1 0xffffffff8441b000     3210 intpm.ko
7    1 0xffffffff8441f000     2178 smbus.ko
9    1 0xffffffff84451000     9288 aesni.ko
10    1 0xffffffff8445b000     3158 amdtemp.ko
11    1 0xffffffff8445f000     2130 amdsmn.ko
12    1 0xffffffff84422000    2e560 if_wg.ko


I do not really know if it might be compiled statically, but if the speed results of @pfop are correct (i.e. OpnSense 24.1.3_1 = 100%, pfSense CE 2.7.2 = 200% and pfSense+ = 400%), it suggests Netgate really limits use of the integration to pfSense+.

Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on March 18, 2024, 06:49:25 PM
Quote from: meyergru on March 18, 2024, 04:47:01 PM
I do not really know if it might be compiled statically, but if the speed results of @pfop are correct (i.e. OpnSense 24.1.3_1 = 100%, pfSense CE 2.7.2 = 200% and pfSense+ = 400%), it suggests Netgate really limits use of the integration to pfSense+.
The CPU Crypto offloading/acceleration with iimb.ko is only available in pfSense+ and not available on pfSense CE.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on March 18, 2024, 07:24:33 PM
I know. I could see that the module is missing, but it still is theoretically possible that pfSense CE has some sort of "lightweight" module or static kernel part that does the 200% speed when compared to OpnSense.

I was rather referring to the implications I already suggested indirectly:

The "long way to go" would be to do the same as Netgate has done with iimb.ko in pfSense+ and integrate the (poorly - if at all - documented) FreeBSD kernel crypto API with the corresponding library functions to achieve the full improvement with a factor of 4.

I understand that is much work and with the advent of 14.1, probably it has to be done twice if those APIs changed. So, I would not expect that anytime soon, at least not before integration of 14.1.

However, since we can take it that pfSense CE does not use that approach (BTW: I have a direct confirmation of that fact) and still is faster, maybe there is a "quick win" for the oncoming FreeBSD 13.3-based version 24.7 that does not require as much effort but still doubles Wireguard performance.

My own setup clearly indicates that with typical current OpnSense hardware (like N5105, N100 and their likes), doubling WG performance would break the magical 1 GBps barrier, which would be a decent improvement of the current situation.

Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: mimugmail on March 18, 2024, 08:02:25 PM
I did not follow the whole thread, but did you compare pfsense and OPN on KVM without Wireguard?
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on March 18, 2024, 11:48:48 PM
If you refer to raw network throughput, that does not help much w/r to this. The throughput over the unencrypted interfaces (vtnet0) on my Proxmox host was only ~1,6 Gbps anyway at very low CPU utilisation on both parts. So, when you combine one pfSense CE and one OpnSense on one encrypted connection, the speed is limited by the encryption speed and you will get this:

Quote from: meyergru on March 17, 2024, 06:58:46 PM
Matter-of-fact, I have somewhat verified the "100% faster" claim: In my tests between two otherwise identical OpnSense and pfSense VM instances, they reached speeds of ~1.2 GBit/s in either direction (slow because of virtio networking). Whilst doing that, the OpnSense VM had ~80% load, whereas the pfSense VM only had 40%.

Therefore, I would like to check with a pure FreeBSD 13.2 (and 14) replacement kernel for OpnSense.

BTW: I did check now with FreeBSD 13.3 as well as with FreeBSD 14.0 underneath OpnSense. FreeBSD 13.x kernels show the same speeds, but FreeBSD 14 has around double the speed as the OpnSense original kernel. If the speed is limited by other factors, CPU load halves as compared to the OpnSense 3.2 kernel.

P.S.: I have looked a little closer at pfSense now and find they have made progress in several aspects (like performance, GUI useability), but I still find it hard to use without the possibility for dynamic IPv6 aliases - that feature is being discussed since 2016 or 2018, but was never implemented. At least in Germany, you will only get dynamic IPv6 prefixes, with some ISPs only offering CGNAT for IPv4. Thus, if you aim to host services over IPv6, you will have a hard time to do that with pfSense. I know you can do it with DHCPv6 and hostnames, but I prefer SLAAC - and "NPt to the rescue" is a no-go, as well.

So it seems their target audience lives on another continent than I do.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: lewald on March 23, 2024, 03:25:27 PM
https://1826203.fs1.hubspotusercontent-na1.net/hubfs/1826203/Netgate%20Whitepaper%20-%20boosting-ipsec-perf-with-iimb.pdf

Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: veritigo on May 01, 2024, 09:26:43 PM
I have been reading this thread as I recently have been looking into the benefits of the implementation and wanted to share what I found.  I see that another member already found the PDF document from the conference (see atttached). 

However, I wanted to share the information I have available with everyone including the GitHub location of the Intel® Multi-Buffer Crypto for IPsec Library (a.k.a. IPsec-MB or IIMB)
=====================================================================
Intel GitHub location of the library - https://github.com/intel/intel-ipsec-mb
=====================================================================

=====================================================================
FreeBSD Port that includes the Intel Multi-Buffer Crypto Library
=====================================================================
https://www.freshports.org/security/intel-ipsec-mb/
=====================================================================
Port details
=====================================================================
intel-ipsec-mb Intel(R) Multi-Buffer Crypto for IPsec Library
=====================================================================
Last Update: 2024-02-25 13:19:43
=====================================================================
Intel Multi-Buffer Crypto for IPsec Library is highly-optimized software implementations of the core cryptographic processing for IPsec, which provides industry-leading performance on a range of Intel(R) Processors.
=====================================================================
Other Intel Links and Articles that may be relevant

Intel - Fast Multi-buffer IPsec Implementations on Intel® Architecture Processors
*Older Article that appears to go over requirements to implement and utilize library*
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-multi-buffer-ipsec-implementations-ia-processors-paper.pdf


Get Started with IPsec Acceleration in the FD.io* VPP Project
https://www.intel.com/content/www/us/en/developer/articles/guide/get-started-with-ipsec-acceleration-in-the-fdio-vpp-project.html

Intel® Multi-buffer Crypto for IPsec on DPDK - Get Started
https://www.intel.com/content/www/us/en/developer/videos/intel-multi-buffer-crypto-for-ipsec-on-dpdk-get-started.html


=====================================================================
The kernel module for the Intel® Multi-Buffer Crypto for IPsec Library (a.k.a. IPsec-MB or IIMB)
IPsec-MB is not limited to accelerating IPsec, despite the name.
It leverages CPU SIMD instructions to accelerate anything using kernel crypto functions for AES-GCM-128, AES-GCM-256, AES-CBC-128, AES-CBC-256, SHA1, SHA2, and ChaCha20/Poly1305.
This includes IPsec, WireGuard, OpenVPN DCO and more.
=====================================================================

System tuning values that are directly relevant to Intel IIMB performance:
=====================================================================
kern.crypto.iimb.enable_aescbc="1" # default 1 - disabled 0 - Enables handling of AES-CBC. IIMB can be slower than QAT for CBC so this is a toggle to disable handling for AES-CBC while accelerating other algorithms so IPsec-MB and QAT can coexist in such environments. Supported on x86-64 only.
kern.crypto.iimb.enable_multiq="1" # default value determined by number of CPU cores, =<4 CPU cores = "1", 5-8 CPU cores = "2", 9+ CPU Cores = "4" Value represents number of job threads, multiple queues to handle encryption jobs, i.e. each session is bound to a job thread
kern.crypto.iimb.use_engine="1" # default 1 - disable 0 - used to enable and disable iimb feature
kern.crypto.iimb.use_task="1" # default 0 - enable 1 - used to run seperate task queue for running the encryption job completion callbacks.
=====================================================================


Additionally enabling kernel handling of TLS could further lower CPU utilization as TLS processes are handled directly by the kernel
=====================================================================
All three tuner values have to be enabled in order to enable TLS in kernel, TLS in Kernel is supported in FreeBSD 13.0 or newer releases
=====================================================================
kern.ipc.mb_use_ext_pgs="1"
kern.ipc.tls.enable="1"
kern.ipc.tls.ifnet.permitted="1"
=====================================================================
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on May 01, 2024, 10:04:12 PM
Yes, you are pointing out the obvious, leaving out a few vital parts (i.e. the part about the FreeBSD kernel module is applicable to pfSense+ only):

1. A library to use special vector instructions from Intel does exist and is free to use.

2. A FreeBSD kernel module to leverage that library has been developed by Netgate solely for their paid product, but it is closed source (it accelerates Wireguard by a factor of 4).

3. FreeBSD 14.x is around twice as fast as 13.1 with Wireguard even without special tweaking, which explains why the free Netgate product is faster than OpnSense at this time.

Nothing of this is new information, so what is your point?
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: paulierco on May 14, 2024, 04:38:06 PM
I can confirm that upgrading the kernel from 13.2-RELEASE-p11 -> 14.1-BETA1 really improved alot the wireguard performance. CPU has been reduced by 50% and performanced has been increased by 100%.

https://imgur.com/H0bkKkY
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: kevindd992002 on June 18, 2024, 06:46:22 PM
@pfop did you have the same results?
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: pfop on June 18, 2024, 06:50:11 PM
Quote from: kevindd992002 on June 18, 2024, 06:46:22 PM
@pfop did you have the same results?

Hello, currently I've no OPNsense box to test with, sorry.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: mtxr on December 05, 2024, 11:23:18 PM
does anyone know if the freebsd intel driver will be incorporated into opnsense?

I really need the wireguard speeds to improve. it's currently running at 1/8th the speed of my fiber connection when i kick it on.  My hardware is decent to not be the bottleneck.

Can i install the port myself and implement it somehow?
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: Monviech (Cedrik) on December 06, 2024, 06:36:14 AM
I mean you could use IPsec to push the speed via VPN. Its designed for being customizable in terms of encryption to enhance throughput.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: mimugmail on December 06, 2024, 09:38:04 AM
In 2020 Wireguard already was faster than IPsec on OPNsense. Whats you current speed with and without WireGuard and which OPN version?
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: mtxr on December 06, 2024, 04:34:31 PM
I recently set up OPNsense on a 12th-gen i5 with 16GB of RAM, running bare metal on a mini PC that serves as my router. The system has Intel 2.5GbE NICs and is connected to a 2Gbps fiber connection. Intel QAT is also enabled.

When running speed tests through a nearby data center via a WireGuard tunnel, I'm only getting around 300Mbps. I've applied some tunables I found in the forums, but they haven't improved performance. No IDS/IPS or non-default plugins are enabled, except for WireGuard.

The system works fine otherwise, but the WireGuard speeds are the only issue I've encountered. I haven't tested with OpenVPN yet, but I assume it will be slower.

Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on December 06, 2024, 05:10:27 PM
Just to reiterate the situation with said Intel library and Wireguard:

Intel has made available a library that can be used to speed up some cryptographic operations, including those that are used within Wireguard.

However, the interface for Wireguard has only been implemented by "the other firewall" and not been made publicly available - it is also only contained in the business edition of that software. Up to now, there were no efforts on re-implementing a similar interface into OpnSense.

Wireguard on the other hand once had a fast X64 implementation for its cryptography like in Linux, but that has been dialed back by a FreeBSD maintainer for "reasons", AFAIR.

Since then, the original Wireguard performance on FreeBSD has about doubled, but indeed is way below 1 GBit/s. AFAIK, the implementation also still is inherently single-threaded.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: mtxr on December 07, 2024, 12:02:28 AM
thanks for the summary and definitely understand your post. I guess I will have to accept this wont be solved or looked into any time soon and find alternative solutions.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: km_ on December 11, 2024, 10:40:10 PM
Hello guys. I have tried almost everything in this topic, including microcode updates, playing with MTU and some recommended tunables... and I still get very bad speeds in one direction of tunnel.
Here's iperf3 in straight and reverse directions:

https://i.imgur.com/aUkYz9d.png

One direction is OK, max speed. Reverse - there seems to be a peak at 130-150Mbps and then the speed drops to steady 60-80Mbps. Cpu usage is low. The CPUs are pretty new Xeons, plenty of power.
When using iperf3 directly (no Wireguard), then the speeds are okay.

The configuration:
* 2 sites, 2 servers
* 250/250 optic uplink, latency 3-5ms
* Proxmox with OpnSense VM
* Virtio virtualized interfaces for both LAN and WAN (I havent tried direct pass-though yet), Proxmox bridge networking (vmnet0, vmnet1). Underlying interfaces are 1Gbps

I am using Netflow and Insight, haven't tried disabling those.

How to properly troubleshoot this? I will still try to find the culprit, the speeds are terribly slow.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on December 11, 2024, 10:52:56 PM
Have you tried using more than one stream (iperf -P8)?
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: km_ on December 12, 2024, 10:46:06 PM
Quote from: meyergru on December 11, 2024, 10:52:56 PMHave you tried using more than one stream (iperf -P8)?
Yeah, same results. Currently - no matter what I tweak, the speed is still somewhere capped and test results come out identical. Now all systems are up to date, the problem persists.

I haven't checked any firewall rules, next thing I will probably try disabling rules one by one.
Maybe I should try super low MTU/MSS, I haven't gone below 1200.

If nothing fixes the problem, I will probably have to migrate to OpenVPN or IPSec. I Hope I will find some solution :D
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: Patrick M. Hausen on December 12, 2024, 10:53:12 PM
Don't go below 1280. For one it's useless, second 1280 is the mandatory minimum MTU for IPv6. 1500 - 1280 leaves 220 bytes of possible encapsulation overhead to account for. You will never need that much.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: meyergru on December 12, 2024, 11:44:30 PM
Some comparison data:

I have OpnSense running baremetal on an N100, definitely slower than your 12th gen i5.

When running iperf3 against my datacenter Proxmox OpnSense with vtnet adapters, I get ~400 MBit/s both upstream and downstream with -P4, which I would expect from the N100. I can run MTU 1500 on my equipment on the WAN interfaces and I have MTU 1400 on my Wireguard instances.

The Proxmox in the datacenter is on an Core i5-13500 and I use "host" CPU type to enable AES-NI with 4 cores assigned.

My uplink is somewhat more than 400 MBit/s, so definitely limited by Wireguard performance here. Ping is 37ms over Wireguard.

P.S.: With "--bidir", speed substantially decreases to ~250 MBit/s up- and downstream with -P4. The tests were conducted on OpnSense itself acting as iperf server and client.
Title: Re: Wireguard performance 100% faster on pfSense than OPNsense
Post by: calibrae on January 22, 2025, 01:46:41 PM
Monitoring this thread, but I tried a few things.
My remote is a fedora VPS with 2 cores.
I have an Alpine VM on my network and of course OpnSense.

When iperf'ing from opensense through a wg tunnel to the VPS, I get between 200mbds to 350.
When iperf'ing through another tunnel, directly from the Alpine VM to the VPS, I get results between 750 and 1.2Gbds.
When iperf'ing through a GRE tunnel between OPNSense and the VPS, I get > 1.2Gbds.

CPU usage is never above 30/40% whatever the case. So yeah, there's an issue with OPNSense. Never had any issue with PF, but will never ever go back.

I'm running opn in a qemu VM. Didn't fiddle yet with microcode or QAT virtual function. I'll try and report