IPSEC traffic stalling after 20.7.1 upgrade

Started by Andreas_, September 01, 2020, 03:52:20 PM

Previous topic - Next topic
November 03, 2020, 09:33:09 AM #45 Last Edit: November 03, 2020, 09:43:36 AM by proctor
I use route-based IPsec in my setup, but maybe the issue could also depend on the update too. We tested two devices, one with 20.1.9 (A) and one with 20.7.3 (B). Each device was initially configured with that version. For verification we used a third non OPNsense based device (C).

For testing we used 2 tunnels:
A --> B
A --> C

The setup ran with permanent ping from B|C --> A a couple of days with no issue. After updating A to 20.7.4 both tunnels are showing the issue. Roughly every 2 hours the tunnels start to run fine (that is the lifetime for phase 2) for some time (see attached scrennshot).

Some additional information to this issue.

Now I have a simple working configuration with version 20.7.3 (without update and without configuration import), where it seems to be possible to reproduce at least a similar kind of issue.

IPsec tunnel with route-based ESP (no gateway defined for the tunnel-ip) - ping runs over 48 h with 0,02% loss. After defining a gateway (far gateway with gateway monitoring) it took less then an hour for the first break. After 4 h i have 2% of lost pings.

Any ideas are welcome.

started seeing this yesterday too, ran fine for weeks before that. is it worth testing this on stock FreeBSD/HBSD at this point, possibly on multiple versions?

I had this last week two times on one of my OPNsense installs (Fresh 20.7). IPv6 only IKEv1 between Sophos UTM and OPNSense. Phase 1 and 2 was still active but no traffic. After restarting IPSEC service on OPNsense the traffic started flowing again.

Next time that thing stall, i try to get as much information as possible out of that thing.

Please don't mix your problems with the one fraenki has posted.

Fraenki's problem only happens when running on 20.7 and only after upgrade. If reverted to 20.1 it works again.

If you (both) encountered a timeout, stall, whatevery, please open a new thread and post as many details as you have and dont hang on this one (only if you can reproduce that it works perfect with 20.1).

If anyone is still affected by IPsec instability, please test the following:

Change the following setting...
System: Settings: Miscellaneous -> Hardware acceleration
...from "AES-NI CPU-based" to "none" and save the change. Be sure to reboot the firewall afterwards.

Please report back.


Thanks
- Frank

Quote from: fraenki on December 13, 2020, 09:45:04 PM
If anyone is still affected by IPsec instability, please test the following:

Change the following setting...
System: Settings: Miscellaneous -> Hardware acceleration
...from "AES-NI CPU-based" to "none" and save the change. Be sure to reboot the firewall afterwards.

Please report back.


Thanks
- Frank

Thanks alot Frank, that did the trick and the tunnel is finally stable again.

Quote from: juan.syad on December 20, 2020, 09:06:35 PM
Quote from: fraenki on December 13, 2020, 09:45:04 PM
If anyone is still affected by IPsec instability, please test the following:
Please report back.

Hello Frank, we have the exact same problem with a new installed 21.1. Disable hardware acceleration doesnt help us. We tried to run the vm with e1000 card instead a vmxnet3 Vmware card, nothing helps.

The setup works properly with EAP-Radius and W10 ikeV2 Clients, but after transmitting 200 - 250 Mbyte Data the Tunnel stalled.

Any Ideas?


Quote from: fraenki on December 13, 2020, 09:45:04 PM
If anyone is still affected by IPsec instability, please test the following:

Change the following setting...
System: Settings: Miscellaneous -> Hardware acceleration
...from "AES-NI CPU-based" to "none" and save the change. Be sure to reboot the firewall afterwards.

Please report back.


Thanks
- Frank

Hi Frank,

what did you make believe this was the fault of the AESNI acceleration?

Quote from: Ricardo on February 07, 2021, 01:50:39 PM
what did you make believe this was the fault of the AESNI acceleration?

Extensive testing. It fixes the issue for me, it's 100% reproducable.
If it does not fix the issue for you, then you're likely affected by a different issue.


Regards
- Frank

Problem could be fixed! The fault was the activation of PFS. The Windows 10 client does not receive this setting, if not appropriately set via Powershell. This then led to exactly this error pattern.

Quote from: tomiboy on February 07, 2021, 07:30:09 PM
Problem could be fixed! The fault was the activation of PFS. The Windows 10 client does not receive this setting, if not appropriately set via Powershell. This then led to exactly this error pattern.

Can you share some details how you figured this out, and what was the resolution?

February 07, 2021, 10:35:46 PM #57 Last Edit: February 08, 2021, 01:53:30 PM by tomiboy
The Windows 10 IPSec client has not activated PFS by default.

I had activated PFS under "VPN: IPsec: Mobile Clients -> Phase 2 PFS Group". Windows 10 silently establishes a connection without errors. The connection dies after approx. 200-300 MB of data has been transferred.

To solve this, the connection must be created via Powershell and, for example, the correct PFS parameters must be transferred. This is not possible in the GUI.

PS C:\> Add-VpnConnection -Name "Contoso" -ServerAddress 176.16.1.2 -TunnelType "Ikev2"
PS C:\> Set-VpnConnectionIPsecConfiguration -ConnectionName "Contoso" -AuthenticationTransformConstants None -CipherTransformConstants AES256 -EncryptionMethod AES256 -IntegrityCheckMethod SHA384 -PfsGroup ECP384 -DHGroup ECP384 -PassThru -Force

Quote from: fraenki on December 13, 2020, 09:45:04 PM
If anyone is still affected by IPsec instability, please test the following:

Change the following setting...
System: Settings: Miscellaneous -> Hardware acceleration
...from "AES-NI CPU-based" to "none" and save the change. Be sure to reboot the firewall afterwards.

Please report back.


Thanks
- Frank

Hi Frank,

disabling AES-NI worked for me, too.

IPsecv2 EAP-MS-Chapv2, Scope7 1510 Fiber, OPNsense 21.1

Just one little problem remains: With hardware acceleration, the VPN gives me about 500 Mbit/s. (for about 3,5 seconds, the the packets stop flowing, measured locally via iperf3)

Without, its about 60Mbit/s.

In a production environment, that's a serious problem.

March 14, 2021, 09:26:28 PM #59 Last Edit: March 14, 2021, 09:32:35 PM by Cerberus
I found something in the pfsense forums about issues with aes-ni and sha256 hw acceleration, their workaround for now is using qat (which opnsense dont have and requires certain hardware), disable aes-ni, not using sha-256 hash or switch to aes-gcm without the need for a hash. Any of the last three solutions help solving the issue for me.