OPNsense Forum

English Forums => Virtual private networks => Topic started by: anomaly0617 on March 07, 2024, 08:27:02 pm

Title: IPSec Connections and HA (High Availability) Problems **POSSIBLY SOLVED**
Post by: anomaly0617 on March 07, 2024, 08:27:02 pm
Edit: It occurred to us that we neglected to mention the version of OPNsense we're working with here. Every device is running at least OPNsense 24.1.2 or newer. /edit

Hi all,

Well, we thought we had this problem resolved (see my previous post if you're confused) but it turns out, maybe not.

We're testing the new Strongswan IPSec Connections before we roll them out to all of our partners.

We have a handful of test sites that are single ISP, single OPNsense firewall locations. The IPSec VPN tunnels between these sites seem to work beautifully. Generally no issues.

We colocate in a datacenter site that has multiple ISPs. They are backed by 60+ carriers, and they use some form of OSPF/BGP/RIP advertisements to switch us all dynamically across routes as necessary. There are some large Fortune 500 companies in the same datacenter. They do not go down. Ever. OK, maybe ALMOST never. But it hasn't happened in 4+ years of having servers there. And we've never experienced any issues with IPSec or OpenVPN tunnels thus far. So I doubt their routes have anything to do with this problem.

At this datacenter site, we have two OPNsense firewalls running on identical Dell PowerEdge R240 servers. They are in a high availability (HA) cluster.

We're having challenges with the satellite locations' IPSec VPN tunnels staying alive to the datacenter site with the High Availability Cluster. Every 4 hours the Phase 1 seems to rekey/renegotiate successfully, but the Phase 2 often seems "broken." Like, it appears that there is a Phase 2 connection being made, but the traffic is only one way. No "bytes in" at the High Availability location. The satellite location records "bytes out", but we don't see them reflected at the datacenter site.

So far, we've tried a bunch of the suggestions we've seen, such as:
On the Phase 1 side:On the Phase 2 side:
But then:

We dug into the logs. Remember how everything should be coming in and going out from ".146"? What we noticed was that traffic exiting the firewall is going out with it's Non-CARP (Real WAN) IP address, ".158" in this case.

11[NET] <4496d3d2-82a6-4b82-bf9c-3d0b78a3096a|375> sending packet: from xxx.xxx.xxx.158[4500] to xxx.xxx.xxx.92[4500] (96 bytes)

And it would appear that the satellite firewalls are responding to that traffic in kind, because we have log traffic that looks like this:

12[NET] <4496d3d2-82a6-4b82-bf9c-3d0b78a3096a|375> received packet: from xxx.xxx.xxx.92[4500] to xxx.xxx.xxx.158[4500] (96 bytes)

So, this is just NAT, right? We should be able to redirect that NAT traffic using an outbound NAT rule, I would assume. Just like we tell the firewall to send traffic from a server inside the datacenter out using a different IP, say, ".152", we should be able to tell the firewall to take any traffic from strongswan and route it out the door using ".146".

But is the source IP on that the LAN IP address, or the WAN IP address? I could make arguments for it being both.

We thought we'd try it just by specifying IPSec as the interface, but that did not work.

And, that might be a red herring. We may be barking up the wrong tree on the fact that it's entering/exiting from the real interface instead of appearing to enter/exit from the Virtual IP/CARP interface.

Any advice on HA IPSec configurations would be welcome. We've got a lot of HA setups across the world, and more are coming as we go multi-ISP and multi-firewall for sites.

We're happy to send screen captures to someone privately, but I don't want to post them publicly. There are so many things we'd have to redact that I suspect it would be redundant to do so.

Thanks, in advance!
Title: Re: IPSec Connections and HA (High Availability) Problems
Post by: anomaly0617 on March 08, 2024, 03:40:32 am
Update: We suspect we've found the cause of this and the resolution.

The fix is likely to put an Manual Outbound NAT rule in place that says "Interface=WAN, Source=IPSec Net, NAT Address=(CARP WAN IP that you want). Be sure to position the rule such that it makes sense because a lot of traffic is going to go through this rule and if they are processed sequentially from top to bottom, you don't want it going through 20 rules to find a match every time.

The cause seems to be that High Availability is cycling between the two OPNsense firewalls. When this happens AND there isn't a rule in place as mentioned above, the IP address of the firewall changes, which throws the firewall it connects to off in a major way. Once the rule above was in place AND we cycled the Strongswan service to reset all the tunnels, the problem (so far) has disappeared. Only time will tell if it remains gone.
Title: Re: IPSec Connections and HA (High Availability) Problems
Post by: mimugmail on March 08, 2024, 08:07:45 am
Yes, if you see port 4500 there's NAT involved, many times the reason is that you set "any" as the source and not internal networks, thus this will get your local initiiated packets beeing natted.
Title: Re: IPSec Connections and HA (High Availability) Problems
Post by: anomaly0617 on March 08, 2024, 02:55:39 pm
Update 2: Still not working properly. As of this morning we have this in the logs from my satellite office:

2024-03-08T08:40:13-05:00   Informational   charon   09[NET] <353> sending packet: from xxx.xxx.xxx.185[500] to xxx.xxx.xxx.157[47289] (36 bytes)   
2024-03-08T08:40:13-05:00   Informational   charon   09[IKE] <353> no IKE config found for xxx.xxx.xxx.xxx...xxx.xxx.xxx.157, sending NO_PROPOSAL_CHOSEN

157 and 158 are the actual WAN addresses for the individual firewalls. They should never appear. This should always say the traffic is coming from the CARP address, 146.

Any ideas? I'm fresh out of them.
Title: Re: IPSec Connections and HA (High Availability) Problems
Post by: anomaly0617 on March 09, 2024, 08:18:29 pm
Update 3: Got it!

Here's what has been working since 17:31 yesterday (it's now 14:00 here).

Under Firewall >> NAT >> Outbound, create a rule:
Move this rule to the top of your manual rule stack.

Clone it. For this one, here are the parts that change:This rule defaults to the second from the top, so no need to move it.

Now Clone it (3rd rule).This rule should now be the third down in the stack, so no need to move it.

Now: