AES-NI hardware acceleration of WireGuard through KVM?

Started by mattlach, March 12, 2024, 04:51:52 AM

Previous topic - Next topic
Hi Everyone,

Is there any way I can confirm if AES-NI is actually working in OPNSense?

I recently upgraded from consumer hardware (Core i3-7100) running pfSense bare metal to more Enterpise-y (technical term) hardware (Supermicro X12STL-F with Xeon E-2314) running Proxmox with OPNSense as a guest in KVM with Direct IO forwarded NIC's.

Previously I was running OpenVPN in pfSense and achieving approximately 600-700Mbps while only seeing relatively limited CPU load (~12% at most of the i3-7100).

With wireguard on the new system (which should be much faster) I am seeing some 70% CPU load at it's highest, which just doesn't seem right.

My suspicion is that maybe AES-NI isn't getting passed through right to the OPNSense guest.

My configuration is as follows:
Hardware:
- Xeon E-2314 (4C/4T) Rocket Lake, 3.1 Base, 4.6 Max Turbo.
- 32GB DDR4-3200 ECC UDIMMS  (4x8GB)
- On board dual Intel i210 Gigabit Adapters.
- Dual Port x520 SFP+ 10Gbit discrete NIC

Configuration of the OPNSense guest is as follows:
- 3 cores assigned
- 8GB of RAM assigned
- 1x i210 Gigabit adapter direct I/O forwarded  (used as WAN)
- 1x x520 adapter port direct I/o forwarded  (used as LAN)

I left the CPU config as "host" (which means it tells the guest the actual CPU details installed in the host).   Additionally I passed the flag flags=+aes to KVM to make sure AES-NI acceleration is made available.

By all rights, this hardware hould be "TOTAL OVERKILL" for a router, yet still I am seeing some seriously surprising CPU load peaks that I can't quite make sense out of.

So again,

I suspect that the host maybe isn't making AES-NI available to the guest (or it may otherwise not be using it properly).  Can I test / check this somehow?   I cant seem to find anything in the GUI that is helpful.  The "Hardware Acceleration" section in System Settings has a few options, but none of them appear to be AES-NI.   They appear to be discrete hardware acceleration solutions.    Some googling suggested I should leave this as "none" which is what I did.  Was this a mistake?

I am pretty comfortable with the Linux command line, but BSD is, well, a little bit different.  Any suggestions where I can poke around in the console to verify the system is actually seeing and using AES-NI?

It could also be I am barking up the wrong tree.   Any pitfalls of using OPNSense under KVM I should be aware of?  Any particular configuration recommendations?

I'd appreciate any suggestions anyone might have!

OPNSense running as a VM in KVM under Proxmox:
- Rocket Lake Xeon E2314 in a Supermicro X12STL-F.  
- IOMMU forwarded i210 Ethernet for WAN and x520 for LAN.
- Pi-hole running as separate LXC Container on same server. 
- Lots of VLAN's and tricky firewall rules.

March 12, 2024, 06:03:46 AM #1 Last Edit: March 12, 2024, 06:10:17 AM by Monviech
I use Opnsense under KVM for dev purposes and the most I tickled out of Wireguard was 850Mbit/s with iperf3.

Though Wireguard doesnt use AES, it uses Chacha. Heres a good recent thread with benchmarks and search for answers:

https://forum.opnsense.org/index.php?topic=38909.0

Also, it might be beneficial to allow (CPU Passthrough) to the VM, if you use QEMU CPU features will be masked. I found it easiest to configure using Cockpit (apt install cockpit cockpit-machines) https://cockpit-project.org/
Hardware:
DEC740

Thank you, I will have to read up on Chacha.  Appreciate the link.

It still seems odd to me though that just a speedtest through wireguard at ~890MBit/s should pin 3 modern CPU cores at ~93% average.

That just can't be right...

OPNSense running as a VM in KVM under Proxmox:
- Rocket Lake Xeon E2314 in a Supermicro X12STL-F.  
- IOMMU forwarded i210 Ethernet for WAN and x520 for LAN.
- Pi-hole running as separate LXC Container on same server. 
- Lots of VLAN's and tricky firewall rules.

Quote from: Monviech on March 12, 2024, 06:03:46 AM
I use Opnsense under KVM for dev purposes and the most I tickled out of Wireguard was 850Mbit/s with iperf3.

Though Wireguard doesnt use AES, it uses Chacha. Heres a good recent thread with benchmarks and search for answers:

https://forum.opnsense.org/index.php?topic=38909.0

Also, it might be beneficial to allow (CPU Passthrough) to the VM, if you use QEMU CPU features will be masked. I found it easiest to configure using Cockpit (apt install cockpit cockpit-machines) https://cockpit-project.org/

Ah, well a link from within that link helps explain it.

Apparently OPNSense forces Spectre and Meltdown mitigations to be on, even if the architecture is not affected by either Spectre or meltdown.    I'm not sure that helps explain ALL of the serious spikes in CPU use, but just fixing those two ought to make a huge difference.

The proper way of doing this would be to enable the mitigations only if running on affected hardware, but I digress...
OPNSense running as a VM in KVM under Proxmox:
- Rocket Lake Xeon E2314 in a Supermicro X12STL-F.  
- IOMMU forwarded i210 Ethernet for WAN and x520 for LAN.
- Pi-hole running as separate LXC Container on same server. 
- Lots of VLAN's and tricky firewall rules.

March 13, 2024, 03:03:07 AM #4 Last Edit: March 13, 2024, 03:05:31 AM by mattlach
Quote from: mattlach on March 13, 2024, 02:39:33 AM
Ah, well a link from within that link helps explain it.

Apparently OPNSense forces Spectre and Meltdown mitigations to be on, even if the architecture is not affected by either Spectre or meltdown.    I'm not sure that helps explain ALL of the serious spikes in CPU use, but just fixing those two ought to make a huge difference.

The proper way of doing this would be to enable the mitigations only if running on affected hardware, but I digress...

Well, that wasn't it.    I set vm.pmap.pti" to "0" and "hw.ibrs_disable" to "1" and rebooted.

A speed test using wireguard still got the system to peak at 89% CPU, and leveling off between 50 and 75% mid test.

This is about 30-50 times higher than expected.


Quote from: Monviech on March 12, 2024, 06:03:46 AM
Also, it might be beneficial to allow (CPU Passthrough) to the VM, if you use QEMU CPU features will be masked. I found it easiest to configure using Cockpit (apt install cockpit cockpit-machines) https://cockpit-project.org/

Hmm.   In the configuration I have CPU set to "host", and my OPNSense guest reports th eproper model of the CPU's (Intel(R) Xeon(R) E-2314 CPU @ 2.80GHz (3 cores, 3 threads)) in the Dashboard.  Is this what you mean?

Could it be that upstream BSD just doesn't recognize Rocket lake yet?   I'm more of a Linux guy, so I don't have enough BSD experience to know how quickly they jump on current hardware. 

Back in the day it used to take seemingly forever for the Linux kernel to properly support current hardware, but these days it is pretty quick.   I know FreeBSD is more conservative, so I wonder if that could be a contributing factor.   But Rocket Lake has been on the market now for - what - 2-3 years depending on the CPU.   (first Rocket Lake chips hit in early Q1 2021, with these Xeons following in Q3 2021)

I wonder what would happen if I forced KVM to report the CPU as an older Xeon model?
OPNSense running as a VM in KVM under Proxmox:
- Rocket Lake Xeon E2314 in a Supermicro X12STL-F.  
- IOMMU forwarded i210 Ethernet for WAN and x520 for LAN.
- Pi-hole running as separate LXC Container on same server. 
- Lots of VLAN's and tricky firewall rules.

March 13, 2024, 05:02:43 AM #5 Last Edit: March 13, 2024, 06:03:06 AM by mattlach
Quote from: Monviech on March 12, 2024, 06:03:46 AM
Though Wireguard doesnt use AES, it uses Chacha.

Which begs the question...

Why on earth would they use this cipher if every piece of hardware known to man made in the last 15 years accelerates AES?

Seems like a curious choice.

If I can't get this figured out, I may just move back to OpenVPN.

I was able to push 700Mbit/s, sometimes even higher, and it only used ~11% of one core of an 8 year old Kaby Lake i3, while doing the equivalent in Wireguard now takes ~90% of three cores of Rocket Lake, a 6 generation newer CPU which should be beyond overkill for the task... :p

And I was told WireGuard was supposed to be faster, more lightweight and efficient than OpenVPN... :p

Linux Kernel 5.6 apparently implemented some form of AVX optimization to help accelerate ChaCha Poly1305, including use of AVX-512 if available.   It is available on my Rocket Lake chip.  I wonder if FreeBSD can do the same...
OPNSense running as a VM in KVM under Proxmox:
- Rocket Lake Xeon E2314 in a Supermicro X12STL-F.  
- IOMMU forwarded i210 Ethernet for WAN and x520 for LAN.
- Pi-hole running as separate LXC Container on same server. 
- Lots of VLAN's and tricky firewall rules.