Home
Help
Search
Login
Register
OPNsense Forum
»
English Forums
»
Virtual private networks
»
IPSec Connections and HA (High Availability) Problems **POSSIBLY SOLVED**
« previous
next »
Print
Pages: [
1
]
Author
Topic: IPSec Connections and HA (High Availability) Problems **POSSIBLY SOLVED** (Read 1376 times)
anomaly0617
Jr. Member
Posts: 50
Karma: 0
IPSec Connections and HA (High Availability) Problems **POSSIBLY SOLVED**
«
on:
March 07, 2024, 08:27:02 pm »
Edit: It occurred to us that we neglected to mention the version of OPNsense we're working with here. Every device is running at least OPNsense 24.1.2 or newer. /edit
Hi all,
Well, we thought we had this problem resolved (see my previous post if you're confused) but it turns out, maybe not.
We're testing the new Strongswan IPSec Connections before we roll them out to all of our partners.
We have a handful of test sites that are single ISP, single OPNsense firewall locations. The IPSec VPN tunnels between these sites seem to work beautifully. Generally no issues.
We colocate in a datacenter site that has multiple ISPs. They are backed by 60+ carriers, and they use some form of OSPF/BGP/RIP advertisements to switch us all dynamically across routes as necessary. There are some large Fortune 500 companies in the same datacenter. They do not go down. Ever. OK, maybe ALMOST never. But it hasn't happened in 4+ years of having servers there. And we've never experienced any issues with IPSec or OpenVPN tunnels thus far. So I doubt their routes have anything to do with this problem.
At this datacenter site, we have two OPNsense firewalls running on identical Dell PowerEdge R240 servers. They are in a high availability (HA) cluster.
We're having challenges with the satellite locations' IPSec VPN tunnels staying alive to the datacenter site with the High Availability Cluster. Every 4 hours the Phase 1 seems to rekey/renegotiate successfully, but the Phase 2 often seems "broken." Like, it appears that there is a Phase 2 connection being made, but the traffic is only one way. No "bytes in" at the High Availability location. The satellite location records "bytes out", but we don't see them reflected at the datacenter site.
So far, we've tried a bunch of the suggestions we've seen, such as:
On the Phase 1 side:
We've disabled MOBIKE in the Phase 1 for all sites that connect to this HA cluster
We've "dumbed it down" so that the only IP listed for each tunnel at that site on the local side is the primary CARP Virtual IP, let's call it ".146".
We've "dumbed down" the satellite sites so that the only IP they connect to is ".146"
We've switched to all IP addresses for local and remote IPs, so no name resolution is required.
We've played around with the DPD value on the tunnels going to that site. In general the DPD is set to 1, but Franco and I had a discussed ages ago about DPD and its negative effects on Voice over IP traffic, so we've generally been wary of DPD since then. I tried setting it to 0, and we've tried it set to 86400 (a day). It does not seem to make a difference.
We set a continuous ping on each firewall, pinging the other one to see if it would keep the tunnel alive. It didn't.
We made sure under Firewall -> Rules -> WAN we have three rules: one for ESP, one for ISAKMP, and one for NAT-T. Originally we had them limited by source and destination, but the most recent configuration has them set to Any/Any for the Source/Destination set. This is what got tunnels up last time (more on that in a minute). We'd like to lock this down. Just having it this way makes us twitchy.
We're using EC521 certificate keypairs, not pre-shared keys
On the Phase 2 side:
We've tried it with and without Policies checked. Thing is, we were never able to get a tunnel online with Policies unchecked, so we've been leaving it checked.
Start action is Trap + Start
Close action is Start
DPD action is Start
We've vacillated between ESP Proposals. Ideally we want to use AES256-SHA512-ECP521, but we've had to switch to AES256-SHA512-MODP2048 on multiple occasions. The Phase 1 is consistently AES256-SHA512-ECP521
Local Subnets only include /24 subnets that are at the datacenter location. No exceptions.
Remote Subnets only include /24 subnets that are at the given satellite location. No exceptions.
But then:
We dug into the logs. Remember how everything should be coming in and going out from ".146"? What we noticed was that traffic exiting the firewall is going out with it's Non-CARP (Real WAN) IP address, ".158" in this case.
11[NET] <4496d3d2-82a6-4b82-bf9c-3d0b78a3096a|375> sending packet: from xxx.xxx.xxx.158[4500] to xxx.xxx.xxx.92[4500] (96 bytes)
And it would appear that the satellite firewalls are responding to that traffic in kind, because we have log traffic that looks like this:
12[NET] <4496d3d2-82a6-4b82-bf9c-3d0b78a3096a|375> received packet: from xxx.xxx.xxx.92[4500] to xxx.xxx.xxx.158[4500] (96 bytes)
So, this is just NAT, right? We should be able to redirect that NAT traffic using an outbound NAT rule, I would assume. Just like we tell the firewall to send traffic from a server inside the datacenter out using a different IP, say, ".152", we should be able to tell the firewall to take any traffic from strongswan and route it out the door using ".146".
But is the source IP on that the LAN IP address, or the WAN IP address? I could make arguments for it being both.
We thought we'd try it just by specifying IPSec as the interface, but that did not work.
And, that might be a red herring. We may be barking up the wrong tree on the fact that it's entering/exiting from the real interface instead of appearing to enter/exit from the Virtual IP/CARP interface.
Any advice on HA IPSec configurations would be welcome. We've got a lot of HA setups across the world, and more are coming as we go multi-ISP and multi-firewall for sites.
We're happy to send screen captures to someone privately, but I don't want to post them publicly. There are so many things we'd have to redact that I suspect it would be redundant to do so.
Thanks, in advance!
«
Last Edit: March 09, 2024, 08:19:00 pm by anomaly0617
»
Logged
anomaly0617
Jr. Member
Posts: 50
Karma: 0
Re: IPSec Connections and HA (High Availability) Problems
«
Reply #1 on:
March 08, 2024, 03:40:32 am »
Update: We suspect we've found the cause of this and the resolution.
The fix is likely to put an Manual Outbound NAT rule in place that says "Interface=WAN, Source=IPSec Net, NAT Address=(CARP WAN IP that you want). Be sure to position the rule such that it makes sense because a lot of traffic is going to go through this rule and if they are processed sequentially from top to bottom, you don't want it going through 20 rules to find a match every time.
The cause seems to be that High Availability is cycling between the two OPNsense firewalls. When this happens AND there isn't a rule in place as mentioned above, the IP address of the firewall changes, which throws the firewall it connects to off in a major way. Once the rule above was in place AND we cycled the Strongswan service to reset all the tunnels, the problem (so far) has disappeared. Only time will tell if it remains gone.
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: IPSec Connections and HA (High Availability) Problems
«
Reply #2 on:
March 08, 2024, 08:07:45 am »
Yes, if you see port 4500 there's NAT involved, many times the reason is that you set "any" as the source and not internal networks, thus this will get your local initiiated packets beeing natted.
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
anomaly0617
Jr. Member
Posts: 50
Karma: 0
Re: IPSec Connections and HA (High Availability) Problems
«
Reply #3 on:
March 08, 2024, 02:55:39 pm »
Update 2: Still not working properly.
As of this morning we have this in the logs from my satellite office:
2024-03-08T08:40:13-05:00 Informational charon 09[NET] <353> sending packet: from xxx.xxx.xxx.185[500] to xxx.xxx.xxx
.157
[47289] (36 bytes)
2024-03-08T08:40:13-05:00 Informational charon 09[IKE] <353> no IKE config found for xxx.xxx.xxx.xxx...xxx.xxx.xxx
.157
, sending NO_PROPOSAL_CHOSEN
157 and 158 are the actual WAN addresses for the individual firewalls. They should never appear. This should always say the traffic is coming from the CARP address, 146.
Any ideas? I'm fresh out of them.
Logged
anomaly0617
Jr. Member
Posts: 50
Karma: 0
Re: IPSec Connections and HA (High Availability) Problems
«
Reply #4 on:
March 09, 2024, 08:18:29 pm »
Update 3: Got it!
Here's what has been working since 17:31 yesterday (it's now 14:00 here).
Under
Firewall >> NAT >> Outbound
,
create a rule
:
Interface:
WAN
TCP/IP Version:
IPv4
Protocol:
ESP
Source Address:
This Firewall
*This seems to be the REALLY important part!
Destination Address:
any
Translation/Target:
[Your CARP Virtual IP WAN Address you want to use for VPN]
Description:
IPSec ESP Traffic Out
Save It.
Move this rule to the top of your manual rule stack.
Clone it. For this one, here are the parts that change:
Protocol:
UDP
Destination Port:
ISAKMP
Description:
IPSec ISAKMP Traffic Out
Save It.
This rule defaults to the second from the top, so no need to move it.
Now Clone it (3rd rule).
Destination Port:
NAT-T
Description:
IPSec NAT-T Traffic Out
Save It.
This rule should now be the third down in the stack, so no need to move it.
Now:
Apply Changes
Sync your HA servers!
Now
restart Your IPSec services
on the HA Firewall.
Verify
that all tunnels come back up.
Logged
Print
Pages: [
1
]
« previous
next »
OPNsense Forum
»
English Forums
»
Virtual private networks
»
IPSec Connections and HA (High Availability) Problems **POSSIBLY SOLVED**