IPsec - Policy vs Route based Differences in Phase 2 Rekey

Started by illogik, February 01, 2024, 04:45:59 AM

Previous topic - Next topic
Hi everyone,

I'm creating a route based VPN between one of my locations and AWS and an experiencing some weird behavior when it comes time for the Phase 2 rekey.

For reference (and to add to the confusion), I've configured a policy based VPN to AWS with the exact same settings as the route based VPN (using the OPNSense default for all proposals and matching the lifetime/rekey times to what AWS wants), and the policy based VPN works without any issues.

Essentially the VPN starts fine, runs for 1 hour, then fails during the rekey period. I'm looking through the logs and am seeing that I'm getting a "no acceptable proposal found" message even though it appears that things should work as there are matching proposals.

An example from the logs where I've changed the IP addresses:
Quote
11[ENC] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> parsed CREATE_CHILD_SA request 316 [ N(REKEY_SA) SA No KE TSi TSr ]
11[CFG] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> configured proposals: ESP:AES_CBC_128/AES_CBC_192/AES_CBC_256/HMAC_SHA2_256_128/HMAC_SHA2_384_192/HMAC_SHA2_512_256/HMAC_SHA1_96/AES_XCBC_96/NO_EXT_SEQ, ESP:AES_GCM_16_128/AES_GCM_16_192/AES_GCM_16_256/NO_EXT_SEQ
11[ENC] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> generating CREATE_CHILD_SA response 316 [ N(NO_PROP) ]
11[NET] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> received packet: from 10.0.0.1[4500] to 10.100.0.1[4500] (764 bytes)
11[KNL] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> querying policy 192.168.1.0/24 === 192.168.2.0/24 in failed, not found
11[CFG] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> received proposals: ESP:AES_CBC_128/AES_CBC_256/HMAC_SHA1_96/HMAC_SHA2_256_128/HMAC_SHA2_384_192/HMAC_SHA2_512_256/MODP_2048/MODP_1024/MODP_3072/MODP_4096/MODP_6144/MODP_8192/MODP_1024_160/MODP_2048_224/MODP_2048_256/MODP_1536/ECP_256/ECP_384/ECP_521/NO_EXT_SEQ, ESP:AES_GCM_16_128/AES_GCM_16_256/MODP_2048/MODP_1024/MODP_3072/MODP_4096/MODP_6144/MODP_8192/MODP_1024_160/MODP_2048_224/MODP_2048_256/MODP_1536/ECP_256/ECP_384/ECP_521/NO_EXT_SEQ
11[IKE] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> failed to establish CHILD_SA, keeping IKE_SA
11[IKE] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> no acceptable proposal found

It continues on like this for a bit with trying to rekey and failing until eventually I see:
Quote
14[IKE] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> CHILD_SA closed
10[CFG] trap not found, unable to acquire reqid 3
...
14[IKE] <8d241d13-f558-4dfb-bc38-b223d034e82d|2> closing CHILD_SA d2b12213-cfc6-4add-bb19-40189425785b{2} with SPIs c698e5dd_i (144508 bytes) c1cbdfef_o (407048 bytes) and TS 192.168.1.0/24 === 196.168.0.0/24

And at this point the far end of the VTI stops responding and traffic stops flowing.

I've left out some other logs relatated to the teardown and I'm not sure they proving any other useful infomation but can post them if you'd like.

Also of note is that I have 2x VTI running off of the same outside IP for the same networks (typical for an AWS VPN), and each of them have a different reqid. I experience the same behavior on both of them at the same time.

One thing of note is that I'm peering via BGP, and I've noticed that the BGP networks are showing up in the VPN logs as opposed to what I would expect to be 0.0.0.0/0. I'm not sure if this is expected or not.

When I did some more digging into the differeces between how the route based vs policy based VPNs are running and did find something interesting and am posting the relevant information from "ipsec showall" below.

Route based w/ IPs changed throughout:
Quote
Connections:
aaf171bd-3651-4058-a4a5-173f74170a88:  10.0.0.1...10.100.0.1  IKEv1/2, dpddelay=10s
aaf171bd-3651-4058-a4a5-173f74170a88:   local:  [10.0.0.1] uses pre-shared key authentication
aaf171bd-3651-4058-a4a5-173f74170a88:   remote: [10.100.0.1] uses pre-shared key authentication
df0fa5da-9de6-456c-aec0-5d5f92dd56f2:   child:  0.0.0.0/0 === 0.0.0.0/0 TUNNEL, dpdaction=none
8d241d13-f558-4dfb-bc38-b223d034e82d:  10.0.0.1...10.2.0.1  IKEv1/2, dpddelay=10s
8d241d13-f558-4dfb-bc38-b223d034e82d:   local:  [10.0.0.1] uses pre-shared key authentication
8d241d13-f558-4dfb-bc38-b223d034e82d:   remote: [10.2.0.1] uses pre-shared key authentication
d2b12213-cfc6-4add-bb19-40189425785b:   child:  0.0.0.0/0 === 0.0.0.0/0 TUNNEL, dpdaction=none
Security Associations (2 up, 0 connecting):
8d241d13-f558-4dfb-bc38-b223d034e82d[2]: ESTABLISHED 54 minutes ago, 10.0.0.1[10.0.0.1]...10.2.0.1[10.2.0.1]
8d241d13-f558-4dfb-bc38-b223d034e82d[2]: IKEv2 SPIs: 79b71f8158877bd3_i* 6ee5444982fa6228_r, rekeying in 2 hours
8d241d13-f558-4dfb-bc38-b223d034e82d[2]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
d2b12213-cfc6-4add-bb19-40189425785b{2}:  INSTALLED, TUNNEL, reqid 4, ESP in UDP SPIs: c698e5dd_i c1cbdfef_o
d2b12213-cfc6-4add-bb19-40189425785b{2}:  AES_CBC_128/HMAC_SHA1_96, 131461 bytes_i, 370264 bytes_o, rekeying active
d2b12213-cfc6-4add-bb19-40189425785b{2}:   192.168.1.0/24 === 192.168.0.0/24
aaf171bd-3651-4058-a4a5-173f74170a88[1]: ESTABLISHED 54 minutes ago, 10.0.0.1[10.0.0.1]...10.100.0.1[10.100.0.1]
aaf171bd-3651-4058-a4a5-173f74170a88[1]: IKEv2 SPIs: 462cfcc566cdae50_i* c5e5d1a11af57d2a_r, rekeying in 3 hours
aaf171bd-3651-4058-a4a5-173f74170a88[1]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
df0fa5da-9de6-456c-aec0-5d5f92dd56f2{1}:  INSTALLED, TUNNEL, reqid 3, ESP in UDP SPIs: cf8b5207_i c210e95d_o
df0fa5da-9de6-456c-aec0-5d5f92dd56f2{1}:  AES_CBC_128/HMAC_SHA1_96, 1640080 bytes_i, 2461144 bytes_o, rekeying active
df0fa5da-9de6-456c-aec0-5d5f92dd56f2{1}:   192.168.1.0/24 === 192.168.0.0/24

Policy based w/ IPs changed throughout:
Quote
Connections:
8f79a251-2d3d-4d61-b7f5-6c0dcfbf0891:  10.0.0.1...10.100.0.2  IKEv1/2, dpddelay=10s
8f79a251-2d3d-4d61-b7f5-6c0dcfbf0891:   local:  [10.0.0.1] uses pre-shared key authentication
8f79a251-2d3d-4d61-b7f5-6c0dcfbf0891:   remote: [10.100.0.2] uses pre-shared key authentication
00e4d642-ed80-4ef0-9832-be100e3ad0ce:   child:  192.168.1.0/24 === 192.168.0.0/24 TUNNEL, dpdaction=start
a810a0e2-1d15-42d6-b69e-4fd9ede3b3cd:  10.0.0.1...10.100.0.1  IKEv1/2, dpddelay=10s
a810a0e2-1d15-42d6-b69e-4fd9ede3b3cd:   local:  [10.0.0.1] uses pre-shared key authentication
a810a0e2-1d15-42d6-b69e-4fd9ede3b3cd:   remote: [10.100.0.1] uses pre-shared key authentication
03b4295e-377c-40dd-a218-2f7fa0c507bb:   child:  192.168.1.0/24 === 192.168.0.0/24 TUNNEL, dpdaction=start
Security Associations (2 up, 0 connecting):
a810a0e2-1d15-42d6-b69e-4fd9ede3b3cd[508]: ESTABLISHED 3 hours ago, 10.0.0.1[10.0.0.1]...10.100.0.1[10.100.0.1]
a810a0e2-1d15-42d6-b69e-4fd9ede3b3cd[508]: IKEv2 SPIs: 1c440b692b07a25e_i* c63fe1a1b59f769b_r, rekeying in 15 minutes
a810a0e2-1d15-42d6-b69e-4fd9ede3b3cd[508]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
03b4295e-377c-40dd-a218-2f7fa0c507bb{2197}:  INSTALLED, TUNNEL, reqid 4, ESP in UDP SPIs: c593b369_i c0d42ecf_o
03b4295e-377c-40dd-a218-2f7fa0c507bb{2197}:  AES_CBC_128/HMAC_SHA1_96/MODP_1024_160, 0 bytes_i (0 pkts, 4s ago), 938144 bytes_o (5592 pkts, 1s ago), rekeying in 9 minutes
03b4295e-377c-40dd-a218-2f7fa0c507bb{2197}:   192.168.1.0/24 === 192.168.0.0/24
8f79a251-2d3d-4d61-b7f5-6c0dcfbf0891[509]: ESTABLISHED 2 hours ago, 10.0.0.1[10.0.0.1]...10.100.0.2[10.100.0.2]
8f79a251-2d3d-4d61-b7f5-6c0dcfbf0891[509]: IKEv2 SPIs: bb4d910f8dd3264d_i* cf6183813ad2d065_r, rekeying in 83 minutes
8f79a251-2d3d-4d61-b7f5-6c0dcfbf0891[509]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
00e4d642-ed80-4ef0-9832-be100e3ad0ce{2198}:  INSTALLED, TUNNEL, reqid 4, ESP in UDP SPIs: c8ded20a_i cbba5c3c_o
00e4d642-ed80-4ef0-9832-be100e3ad0ce{2198}:  AES_CBC_128/HMAC_SHA1_96/MODP_1024_160, 639729 bytes_i (5275 pkts, 1s ago), 904216 bytes_o (5499 pkts, 1s ago), rekeying in 33 minutes
00e4d642-ed80-4ef0-9832-be100e3ad0ce{2198}:   192.168.1.0/24 === 192.168.0.0/24

Something looks different about phase 2. With the route based VPN it's authenticating with "AES_CBC_128/HMAC_SHA1_96" and with the policy based it's authenticating with "AES_CBC_128/HMAC_SHA1_96/MODP_1024_160". I'm not entirely knowledgable on how IPSEC works, but it is my understanding that with IKEv2 the keys for the CHILD_SA that are created with the IKE_AUTH exchange are always derived from the IKE exchange, so if there's a disagreement it isn't known until the first CHILD_SA is rekeyed (exactly what I'm experiencing). What I'm completely lost on is why this is only happening for the route based vpn when all of the settings are configured the same.

Can anyone point me in the right direction of where to look, or any solutions you've found when working with AWS? I can imagine that IPSEC with AWS is a pretty common scenario so there must be known solutions for this.

If you've read this far, thank you for the help!



I should have also mentioned that I've made all of these VPNs under the new "Connections" feature. I also don't know if it's relevant but I also have a number of policy based VPNs set up (oddly enough using the networks that I'm seeing in the logs) which are disabled.

As a workaround for this I've configured Monit to bounce IPsec when I can't ping one of the VTI interface far ends, but that's obviously not a great solution.

I've more thoroughly read the strongSwan documentation https://docs.strongswan.org/docs/5.9/config/rekeying.html#_ipsec_sas and it appears that what I mentioned earlier regarding the differences in the proposals is indeed due to this and that I won't see a DH group in the status output until the SA is rekeyed.

The question still remains though - why does the policy based VPN work while route based doesn't while using the same settings since they should be using the same proposals...

I've done some more digging and things get more interesting...

I'm taking a snippet of /usr/local/etc/swanctl/swanctl.conf both vpn setups related to phase 2.

Route based:
Quote
        children {
            df0fa5da-9de6-456c-aec0-5d5f92dd56f2 {
                reqid = 3
                esp_proposals = default
                sha256_96 = no
                start_action = start
                close_action = none
                dpd_action = clear
                mode = tunnel
                policies = no
                local_ts = 0.0.0.0/0
                remote_ts = 0.0.0.0/0
                rekey_time = 3600
                updown = /usr/local/opnsense/scripts/ipsec/updown_event.py --connection_child df0fa5da-9de6-456c-aec0-5d5f92dd56f2
            }
        }

Policy based:
Quote
        children {
            00e4d642-ed80-4ef0-9832-be100e3ad0ce {
                esp_proposals = aes128-sha1-modp1024s160
                sha256_96 = no
                start_action = start
                close_action = none
                dpd_action = start
                mode = tunnel
                policies = yes
                local_ts = 10.3.0.0/19
                remote_ts = 172.31.0.0/16
                rekey_time = 3600
                updown = /usr/local/opnsense/scripts/ipsec/updown_event.py --connection_child 00e4d642-ed80-4ef0-9832-be100e3ad0ce
            }
        }

Notice that the propsals are different. Even though I have "default" set for the proposals, it looks like it's using "aes128-sha1-modp1024s160" -- DH2. I'm assuming this isn't in the "default" in OPNsense and is matching with what AWS wants.

If there are any devs here - is this a potential bug with how OPNsense propogates phase 2 settings in strongSwan?

In case anyone reads this - it looks like simply defining an accepted AWS proposal worked fine. Using "default" on the OPNsense side (even though there are matching proposals from AWS) doesn't seem to work correctly.


I'm using "OPNsense 23.10.1_2-amd64" which is the latest business release I believe (we have appliances bought from Deciso).

For the route based setup I've attached some screenshots.

routed_1.png - P1 setup
routed_2.png - P2 Setup
routed_statusall.png - Relevant output of "ipsec statusall"

Let me know if you want me to post screenshots of the policy based setup. I've since torn it down but can bring it back up if that helps.