25.4.2 and new "connection" IPsec tunnel - standby node shows phase 1 active

Started by Patrick M. Hausen, August 11, 2025, 06:35:29 PM

Previous topic - Next topic
Hi all,

I just moved one customer IPsec tunnel from old legacy framework to connections - all up and connected.

But this is an HA setup, the local endpoint for the tunnel is the CARP address, CARP is working and has been for years.
But the dashboard widget on the standby shows the tunnel as active (green) with the addition of "Phase2 disconnected".

This was not the case ans still isn't for connections that use the legacy method. For them the standby shows disconnected (red).

Any ideas? Thanks!
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)


root@****:/usr/home/**** # swanctl --list-sas
no files found matching '/usr/local/etc/strongswan.opnsense.d/*.conf'
1d1dcd23-7ec3-498a-bb44-15bb8e20935c: #1, CONNECTING, IKEv1, 116a39afbfb06598_i* 0000000000000000_r
  local  '%any' @ <my IP>[500]
  remote '%any' @ <peer IP>[500]
  queued:  QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE QUICK_MODE
  active:  ISAKMP_VENDOR ISAKMP_CERT_PRE MAIN_MODE ISAKMP_CERT_POST ISAKMP_NATD
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

What I assume is that the backup got the CARP IP of the "local" in the SA at some point and strongswan tried to establish phase 1 during that (maybe brief) timeframe. And now it is just stuck trying to connect.

Does it still try to connect if you disable and enable the IPsec service on backup?
Hardware:
DEC740

Yep. Stopping the IPsec service leads to the widget showing no connection at all but just "phase 1 not configured".

After re-enabling the service the widget shows 4 connections - the legacy ones "red" and the new one "green".
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I just tested it and the least to reproduce is to create a Connection with an IP address that is not locally bindable by the OPNsense

root@opn-dev-01:/src/git/core # swanctl --list-sas
no files found matching '/usr/local/etc/strongswan.opnsense.d/*.conf'
1f464073-5838-4257-83ff-d380e51b3ef0: #4, CONNECTING, IKEv2, 52b58b71a1a741db_i* 0000000000000000_r
  local  '%any' @ 10.20.30.1[500]
  remote '%any' @ 192.168.2.3[500]
  active:  IKE_VENDOR IKE_INIT IKE_NATD IKE_CERT_PRE IKE_AUTH IKE_CERT_POST IKE_CONFIG IKE_AUTH_LIFETIME IKE_MOBIKE IKE_ESTABLISH CHILD_CREATE

I don't have "10.20.30.1" on a firewall interface, but strongswan tries to connect anyway.

It uses the current WAN interface:

WAN
hn1   2025-08-12
09:20:33.926844   00:15:5d:00:ad:06   f4:90:ea:00:d9:f4   IPv4, length 1014: 172.16.1.110.39503 > 192.168.2.3.500: UDP, length 972

It looks like even if there is no bindable address it will fall back to the default gateway and send packets out.

Mobike does not influence this. So no clue yet, but wanted to share.
Hardware:
DEC740

I agree with your analysis. Shall I open an issue on github?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

You can open an issue, but generally its explainable:

Charon binds to:

root@opn-dev-01:/var/lib/php/tmp # sockstat -l | grep 500
root     charon     41968 13  udp4   *:500                 *:*
root     charon     41968 14  udp4   *:4500                *:*
root     charon     41968 15  udp6   *:500                 *:*
root     charon     41968 16  udp6   *:4500                *:*

Which means it uses the wildcard interface.

Then it uses these defaults to determine where to send traffic:

https://github.com/strongswan/strongswan/blob/master/conf/plugins/socket-default.opt

This here is enabled:

charon.plugins.socket-default.set_source = yes
   Set source address on outbound packets, if possible.

This means it will use the IP address in "local" if it exists on the system, otherwise it falls back to the current routing table, thus using the current default gateway route (usually WAN).
Hardware:
DEC740

When I run a tcpdump on WAN ports 500 and 4500 and restart the IPsec service I cannot see any packets leaving the standby.

I guess it's a bug in the widget that interprets "CONNECTING" as active and should not.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Sure it sounds a little weird, best to track it in an issue.
Hardware:
DEC740

Just changed my last post - I had a config error in my CARP setup. Something changed in FreeBSD 14 that triggered a different behaviour, it seems.

But now it's ok - the CARP addresses are not active on the standby and I do not see any outbound IPsec packets. So next best guess: widget bug by changed status message?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

The widget shows the same status as:

/ui/ipsec/sessions

Which calls

/api/ipsec/sessions/search_phase1

And in there if:

"connected":true

Then it shows Phase1 as green.

Maybe it could be refined by checking

"connected":true,"install-time":null

that install-time is not null as well?


Hardware:
DEC740

src/opnsense/mvc/app/controllers/OPNsense/IPsec/Api/SessionsController.php seems to rely on this source to get the state(s):
private function list_status()
{
    return json_decode((new Backend())->configdRun('ipsec list status'), true);
}

When I run:

configctl ipsec list status | jq

- I do get all four customer VPN connections
- only the new "connections" tunnel has got a "state" entry, the "legacy" ones don't
- the state of that tunnel is "CONNECTING", not "CONNECTED"

root@kagate2:~ # configctl ipsec list status | jq | grep state
        "state": "CONNECTING",

Some pattern matching somewhere in configd is too lose so it returns

"connected": true

when it should not. The output of that shows a VPN connection is established is this (checked on the primary - valid for all four VPNs):

"state": "ESTABLISHED",
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I think that makes sense, I do not think it should return true when it is still connecting, and not connected.

Best to ticket that to github if you can.
Hardware:
DEC740

https://github.com/opnsense/core/issues/9082

And I guess I found the root cause. Although I do not know where in the code that happens.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)