CPU Extensions & Technologies (Instruction Sets) : AES-NI, AVX-512, etc.

Started by carly, September 29, 2025, 04:25:03 PM

Previous topic - Next topic
Please forgive my poor websearching abilities.  Is there any benefit to having AVX-512 in the context of OPNsense?  Did I get that mixed up with AES-NI?

To the first question: Possibly, but if you are, for instance, bench racing Arrow Lake vs. Zen 5, for roughly cost-equivalent parts you probably wouldn't experience a meaningful difference. Unless you find a piece of software that heavily utilizes AVX512 specifically, in which case it can make a significant difference. I could see using vector instructions for bulk data processing, e.g. an IPS, but it's not something I use.

As to the second, AES-NI is an SSE-era vector extension, so it's in the ballpark. It's been around for a while, with 128b and 256b implementations (no 512b that I've seen) and one or two available execution units (per core), depending on the processor. I have no idea how the FreeBSD kernel driver implements it. One of these days I'll get around to testing it. For the heck of it, as I don't use VPNs either.

Why do you ask?

Quote from: pfry on September 30, 2025, 05:29:42 AMWhy do you ask?
I was just wondering if there were any functions that OPNsense used that benefited from AVX-512?  I remember that AES-NI is used, but I could not remember the other.

Quote from: carly on September 30, 2025, 04:13:02 PMI was just wondering if there were any functions that OPNsense used that benefited from AVX-512?  I remember that AES-NI is used, but I could not remember the other.

Ya got me. I wouldn't consider AVX512 as a significant input into a purchasing decision for hardware to run OPNsense. It would generally vanish beneath cost, power, I/O options, noise, size, etc. But your choices may differ.

I think the context is askew.

Enabling available extensions (or support for) allows more features to be utilized, but at the same time it also exposes some risk.

AVX512 is a set of 19 unique extensions. A CPU may have it, but will be useless to a binary unless the code was written to utilize the extensions and the compiler includes the extensions. I don't see OPNsense as heavy computational on large data sets. I think the best OPNsense (fw) would get would be performance bump in cyrpto operations, but again, OPNsense (fw) I don't think is on that level of crypto processing.



Mini-pc N150 i226v x520, FREEDOM

Quote from: carly on September 29, 2025, 04:25:03 PMPlease forgive my poor websearching abilities.  Is there any benefit to having AVX-512 in the context of OPNsense?  Did I get that mixed up with AES-NI?
I just had a read through the 8300 model of the otherSense hardware and they write:

"This security gateway boasts a formidable 2.0 GHz, 8-core, 16-thread Intel® Xeon® D-1733NT processor with Intel AVX-512 for exceptional Firewall and VPN performance. The Netgate 8300 harnesses the power of Intel AVX-512 to accelerate VPN operations"

I have no pratical experience but it may warrants a deep look into what AVX-512 can do to accelerate VPN, and what type of VPN. Or if it's just marketing blabla.
Deciso DEC740

Quote from: patient0 on October 01, 2025, 06:54:26 PMI have no pratical experience but it may warrants a deep look into what AVX-512 can do to accelerate VPN, and what type of VPN. Or if it's just marketing blabla.
Well, 512bit extensions and ability to handle hashing and such in parallel (computational tasks), you do (can) get increased performance from that cpu as it relates to crypto functions. Make the extensions 640bit or 768bit, you'll get bump in performance. I think it's a combo of both extension witdth and the actual instructions in cpu. I think if OPNsense wanted to tout how fast it's VPN stuff is, would need to exploit instruction sets specifically made for that use-case, such as AVX-512. Obviously there's caveats to that marketing, like you 1st must have cpu that has such instructions, or their own hardware meets all those requirements, etc. I just not sure what the advantage is to any coder in terms of competition, because they all have ability to exploit those cpu insruction sets. Maybe it's a "we have it, you don't, so ours is better"?

Here's Intel's spin on it
QuoteThe Intel® Multi-Buffer Crypto for IPSec library is a family of highly optimized software implementations of the symmetric
cryptographic algorithms. With the rich and easy-to-use APIs provided by the Intel Multi-Buffer Crypto for IPSec library, you can
easily make full use of the latest cryptographic accelerations provided by Intel CPUs, including the new vAES and vPCLMUL
instructions. These Intel AVX-512-accelerated instructions allow processing up to four 128-bit AES blocks in parallel, getting
theoretically up to four times better performance than the 3rd Gen Intel Xeon Scalable processor. Moreover, the Intel Multi-
Buffer Crypto for IPSec library hides all implementation details to accommodate different CPU flags (SSE, AVX, AVX2,
AVX512) behind the APIs, which ensures highly optimized cryptographic operation results for all Intel® CPUs in the market and
provides the user seamless transition of their code into 3rd Gen Intel Xeon Scalable processor CPU based systems.
For more detailed information about the Intel Multi-Buffer Crypto for IPSec library and Intel vAES and vPCLMUL instructions,
refer to:
� Fast Multi-buffer IPsec Implementations on Intel® Architecture Processors White Paper
� Crypto Acceleration: Enabling a Path to the Future of Computing
� Intel® Multi-Buffer Crypto for IPSec
Mini-pc N150 i226v x520, FREEDOM

Quote from: BrandyWine on October 01, 2025, 07:10:30 PM[...]
These Intel AVX-512-accelerated instructions allow processing up to four 128-bit AES blocks in parallel[...]

Huh. Instruction set references are not consistent here - some indicate X/YMM (128/256b) codings, while some also seem to allow a ZMM (512b) encoding. For most AVX-512-capable processors it wouldn't matter much, as they only have throughput for 2x256b or 1x512b ops per cycle per core. Chelsio had some AES benchmarks where they hit >100Gb/s on a single ~3.5GHz Ivy Bridge-E core (local, not network), which (as far as I know) is limited to 1 x 256b op per cycle. So throughput may not be a big issue for the average OPNsense user. At any rate, specifying "AVX-512" for the feature is a bit misleading.