Wireguard performance 100% faster on pfSense than OPNsense

Started by pfop, February 19, 2024, 05:04:59 PM

Previous topic - Next topic
If you refer to raw network throughput, that does not help much w/r to this. The throughput over the unencrypted interfaces (vtnet0) on my Proxmox host was only ~1,6 Gbps anyway at very low CPU utilisation on both parts. So, when you combine one pfSense CE and one OpnSense on one encrypted connection, the speed is limited by the encryption speed and you will get this:

Quote from: meyergru on March 17, 2024, 06:58:46 PM
Matter-of-fact, I have somewhat verified the "100% faster" claim: In my tests between two otherwise identical OpnSense and pfSense VM instances, they reached speeds of ~1.2 GBit/s in either direction (slow because of virtio networking). Whilst doing that, the OpnSense VM had ~80% load, whereas the pfSense VM only had 40%.

Therefore, I would like to check with a pure FreeBSD 13.2 (and 14) replacement kernel for OpnSense.

BTW: I did check now with FreeBSD 13.3 as well as with FreeBSD 14.0 underneath OpnSense. FreeBSD 13.x kernels show the same speeds, but FreeBSD 14 has around double the speed as the OpnSense original kernel. If the speed is limited by other factors, CPU load halves as compared to the OpnSense 3.2 kernel.

P.S.: I have looked a little closer at pfSense now and find they have made progress in several aspects (like performance, GUI useability), but I still find it hard to use without the possibility for dynamic IPv6 aliases - that feature is being discussed since 2016 or 2018, but was never implemented. At least in Germany, you will only get dynamic IPv6 prefixes, with some ISPs only offering CGNAT for IPv4. Thus, if you aim to host services over IPv6, you will have a hard time to do that with pfSense. I know you can do it with DHCPv6 and hostnames, but I prefer SLAAC - and "NPt to the rescue" is a no-go, as well.

So it seems their target audience lives on another continent than I do.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+


I have been reading this thread as I recently have been looking into the benefits of the implementation and wanted to share what I found.  I see that another member already found the PDF document from the conference (see atttached). 

However, I wanted to share the information I have available with everyone including the GitHub location of the Intel® Multi-Buffer Crypto for IPsec Library (a.k.a. IPsec-MB or IIMB)
=====================================================================
Intel GitHub location of the library - https://github.com/intel/intel-ipsec-mb
=====================================================================

=====================================================================
FreeBSD Port that includes the Intel Multi-Buffer Crypto Library
=====================================================================
https://www.freshports.org/security/intel-ipsec-mb/
=====================================================================
Port details
=====================================================================
intel-ipsec-mb Intel(R) Multi-Buffer Crypto for IPsec Library
=====================================================================
Last Update: 2024-02-25 13:19:43
=====================================================================
Intel Multi-Buffer Crypto for IPsec Library is highly-optimized software implementations of the core cryptographic processing for IPsec, which provides industry-leading performance on a range of Intel(R) Processors.
=====================================================================
Other Intel Links and Articles that may be relevant

Intel - Fast Multi-buffer IPsec Implementations on Intel® Architecture Processors
*Older Article that appears to go over requirements to implement and utilize library*
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-multi-buffer-ipsec-implementations-ia-processors-paper.pdf


Get Started with IPsec Acceleration in the FD.io* VPP Project
https://www.intel.com/content/www/us/en/developer/articles/guide/get-started-with-ipsec-acceleration-in-the-fdio-vpp-project.html

Intel® Multi-buffer Crypto for IPsec on DPDK - Get Started
https://www.intel.com/content/www/us/en/developer/videos/intel-multi-buffer-crypto-for-ipsec-on-dpdk-get-started.html


=====================================================================
The kernel module for the Intel® Multi-Buffer Crypto for IPsec Library (a.k.a. IPsec-MB or IIMB)
IPsec-MB is not limited to accelerating IPsec, despite the name.
It leverages CPU SIMD instructions to accelerate anything using kernel crypto functions for AES-GCM-128, AES-GCM-256, AES-CBC-128, AES-CBC-256, SHA1, SHA2, and ChaCha20/Poly1305.
This includes IPsec, WireGuard, OpenVPN DCO and more.
=====================================================================

System tuning values that are directly relevant to Intel IIMB performance:
=====================================================================
kern.crypto.iimb.enable_aescbc="1" # default 1 - disabled 0 - Enables handling of AES-CBC. IIMB can be slower than QAT for CBC so this is a toggle to disable handling for AES-CBC while accelerating other algorithms so IPsec-MB and QAT can coexist in such environments. Supported on x86-64 only.
kern.crypto.iimb.enable_multiq="1" # default value determined by number of CPU cores, =<4 CPU cores = "1", 5-8 CPU cores = "2", 9+ CPU Cores = "4" Value represents number of job threads, multiple queues to handle encryption jobs, i.e. each session is bound to a job thread
kern.crypto.iimb.use_engine="1" # default 1 - disable 0 - used to enable and disable iimb feature
kern.crypto.iimb.use_task="1" # default 0 - enable 1 - used to run seperate task queue for running the encryption job completion callbacks.
=====================================================================


Additionally enabling kernel handling of TLS could further lower CPU utilization as TLS processes are handled directly by the kernel
=====================================================================
All three tuner values have to be enabled in order to enable TLS in kernel, TLS in Kernel is supported in FreeBSD 13.0 or newer releases
=====================================================================
kern.ipc.mb_use_ext_pgs="1"
kern.ipc.tls.enable="1"
kern.ipc.tls.ifnet.permitted="1"
=====================================================================

Yes, you are pointing out the obvious, leaving out a few vital parts (i.e. the part about the FreeBSD kernel module is applicable to pfSense+ only):

1. A library to use special vector instructions from Intel does exist and is free to use.

2. A FreeBSD kernel module to leverage that library has been developed by Netgate solely for their paid product, but it is closed source (it accelerates Wireguard by a factor of 4).

3. FreeBSD 14.x is around twice as fast as 13.1 with Wireguard even without special tweaking, which explains why the free Netgate product is faster than OpnSense at this time.

Nothing of this is new information, so what is your point?
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

I can confirm that upgrading the kernel from 13.2-RELEASE-p11 -> 14.1-BETA1 really improved alot the wireguard performance. CPU has been reduced by 50% and performanced has been increased by 100%.

https://imgur.com/H0bkKkY


Quote from: kevindd992002 on June 18, 2024, 06:46:22 PM
@pfop did you have the same results?

Hello, currently I've no OPNsense box to test with, sorry.
Firewall Specs: AMD Ryzen 5700G, 16GB DDR4 3200MHz RAM, Intel E810 Quad Port SFP28 NIC
Internet Specs: Init7 25GBit FTTH

does anyone know if the freebsd intel driver will be incorporated into opnsense?

I really need the wireguard speeds to improve. it's currently running at 1/8th the speed of my fiber connection when i kick it on.  My hardware is decent to not be the bottleneck.

Can i install the port myself and implement it somehow?

December 06, 2024, 06:36:14 AM #38 Last Edit: December 06, 2024, 08:33:21 AM by Monviech (Cedrik)
I mean you could use IPsec to push the speed via VPN. Its designed for being customizable in terms of encryption to enhance throughput.
Hardware:
DEC740

In 2020 Wireguard already was faster than IPsec on OPNsense. Whats you current speed with and without WireGuard and which OPN version?

I recently set up OPNsense on a 12th-gen i5 with 16GB of RAM, running bare metal on a mini PC that serves as my router. The system has Intel 2.5GbE NICs and is connected to a 2Gbps fiber connection. Intel QAT is also enabled.

When running speed tests through a nearby data center via a WireGuard tunnel, I'm only getting around 300Mbps. I've applied some tunables I found in the forums, but they haven't improved performance. No IDS/IPS or non-default plugins are enabled, except for WireGuard.

The system works fine otherwise, but the WireGuard speeds are the only issue I've encountered. I haven't tested with OpenVPN yet, but I assume it will be slower.


Just to reiterate the situation with said Intel library and Wireguard:

Intel has made available a library that can be used to speed up some cryptographic operations, including those that are used within Wireguard.

However, the interface for Wireguard has only been implemented by "the other firewall" and not been made publicly available - it is also only contained in the business edition of that software. Up to now, there were no efforts on re-implementing a similar interface into OpnSense.

Wireguard on the other hand once had a fast X64 implementation for its cryptography like in Linux, but that has been dialed back by a FreeBSD maintainer for "reasons", AFAIR.

Since then, the original Wireguard performance on FreeBSD has about doubled, but indeed is way below 1 GBit/s. AFAIK, the implementation also still is inherently single-threaded.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

thanks for the summary and definitely understand your post. I guess I will have to accept this wont be solved or looked into any time soon and find alternative solutions.

Hello guys. I have tried almost everything in this topic, including microcode updates, playing with MTU and some recommended tunables... and I still get very bad speeds in one direction of tunnel.
Here's iperf3 in straight and reverse directions:

https://i.imgur.com/aUkYz9d.png

One direction is OK, max speed. Reverse - there seems to be a peak at 130-150Mbps and then the speed drops to steady 60-80Mbps. Cpu usage is low. The CPUs are pretty new Xeons, plenty of power.
When using iperf3 directly (no Wireguard), then the speeds are okay.

The configuration:
* 2 sites, 2 servers
* 250/250 optic uplink, latency 3-5ms
* Proxmox with OpnSense VM
* Virtio virtualized interfaces for both LAN and WAN (I havent tried direct pass-though yet), Proxmox bridge networking (vmnet0, vmnet1). Underlying interfaces are 1Gbps

I am using Netflow and Insight, haven't tried disabling those.

How to properly troubleshoot this? I will still try to find the culprit, the speeds are terribly slow.

Have you tried using more than one stream (iperf -P8)?
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+