IPsec tunnel unstable after change of hardware - solved, but why?

Started by Patrick M. Hausen, April 28, 2021, 07:31:41 PM

Previous topic - Next topic
Hi all,

I switched my PCengines APU4D4 for a Supermicro A2SDi-4C-HLN4F that was left from a TrueNAS project. Great small board, does need active cooling, but now I have two identical SATA disks and ZFS in a mirror configuration, yeah!

I saved the config, then searched for igb0 and igb1 and replaced them with ix0 and ix1 in the XML config file. Needless to say, the new firewall came up, cables switched, absolutely no problem it seemed.

But: the single IPsec tunnel I run to my company office stopped forwarding packets every hour or so. Probably related to the phase 2 lifetime of 3600 seconds. The tunnel was shown as active, but I could not reach the other network.
Eventually the situation would resolve by "magic", only to fail again shortly thereafter. Alternatively restarting IPsec fixed things for a while.

Now, the hardware change did come with a change of CPU - from an AMD based embedded system to an Intel Atom. So out of "let's see what happens ..." I changed the crypto hardware acceleration setting from "AES/NI" to "None".
The tunnel has been stable ever since.

But ... WHY?

Thanks,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

technically AES/NI is supported on select Intel & AMD. Maybe your proc doesnt support the instruction set

can you please run "kldstat" and see if you can see the aesni module is loaded or not?

something like this

15    1 0xffffffff82bed000     8d50 aesni.ko

iirc the AES-NI crypto on these atom platforms has some quirks that can make them run into this situation. nobody really knows why i guess. it works fine with Xeon CPUs, for example.

to fix it, you'll probably need someone with a deep understanding of inner workings of the kernel, hardware to test with, aswell as time and an actual desire to fix this. probably the best you can hope for is some vendor selling atom based routers/firewalls will fix it to fix their own products. then again, the enterprise customers that generate the money to make it worthwhile will probably not be affected because they use more high-end gear. i dunno but i think you'll just have to live with non-accellerated AES on an atom board.

As far as I read from the documentation, the IPsec implementation will use what the CPU offers regardless of the setting. Equally for OpenSSL. So my inquiry is rather: "what precisely does this setting do and why does it matter?"

I don't need a fix for a problem, because I consider it fixed with everything noe working as it should. At 50 Mbit/s downstream, 10 up, performance is not really an issue.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Seems to be an issue with AES-NI and/or SHA256.

https://forum.opnsense.org/index.php?topic=18918.msg104535#msg104535

Interestingly, I am not having these issues on my Supermicro A2SDi-4C-HLN4F. Currently, running multiple IPsec tunnels with AES-NI and SHA256.

Have you ever tried AES-GCM?