IPsec TI gateway: CHILD_SA / Phase 2 disappears while IKE/DPD stays up

Started by RES217AIII, March 26, 2026, 05:49:21 AM

Previous topic - Next topic
Hello everyone,

I am seeing a recurring issue with a production TI gateway IPsec tunnel on OPNsense 26.1 and would like to understand whether this is a known behavior and what the cleanest way to solve it would be.

Environment:
- OPNsense 26.1
- Site-to-site IPsec tunnel to a TI gateway
- WAN with dynamic public IP
- Phase 2 traffic selector: 0.0.0.0/0 === 0.0.0.0/0
- DPD enabled
- I have already tested different combinations of start_action / close_action / trap

Observed behavior:
- Phase 1 / IKE SA stays up
- DPD continues to run and the peer responds
- At the same time, Phase 2 / CHILD_SA or the related policy disappears
- The log then repeatedly shows messages like:
  "querying policy 0.0.0.0/0 === 0.0.0.0/0 in/out failed, not found"
- No reliable automatic rebuild of the CHILD_SA happens afterwards
- Functionally, the tunnel is dead even though IKE is still alive

Important points:
- I do not see a PPPoE / WAN reconnect in the relevant time window
- From the remote side, the tunnel may still be shown as UP
- A manual or explicitly triggered reconnect restores functionality

My current interpretation:
This does not look like a full tunnel outage:
- IKE / DPD still alive
- CHILD_SA / policy missing
- no automatic rebuild

Questions:
1. Is this a known strongSwan / OPNsense behavior with this kind of tunnel, especially with 0.0.0.0/0 selectors?
2. Is there a native and clean way in OPNsense 26.1 to detect and recover from exactly this state?
3. Would you approach this via built-in OPNsense mechanisms such as IPsec API / sessions / service control / Monit, or does this point more to TI peer-side behavior?
4. Has anyone managed to make this kind of tunnel fully stable without an external watchdog?

Example from the log:
- DPD continues successfully
- at the same time:
  "querying policy 0.0.0.0/0 === 0.0.0.0/0 out failed, not found"
  "querying policy 0.0.0.0/0 === 0.0.0.0/0 in failed, not found"



Thanks a lot.
Supermicro M11SDV-4C-LN4F AMD EPYC 3151 4x 2.7GHz RAM 8GB DDR4-2666 SSD 250GB

Quote from: RES217AIII on March 26, 2026, 05:49:21 AM- Phase 2 traffic selector: 0.0.0.0/0 === 0.0.0.0/0
What's the sense of having 0.0.0.0/0 for both sites?

Normally local and remote network should not overlap to function properly.

Thank you for your reply.

The reason is the strict requirements of the Telematics Infrastructure (TI) in the medical sector, which dictate the configuration.

As far as my research indicates, the problem is the changing WAN IP address. The OPNsense kernel remains in state with the old IP address, which is why it doesn't detect the change and doesn't initiate a new connection, while the remote end with the new IP address can't establish a connection, and phase 2 fails.

In two weeks, the switch to fiber optics will take place, which will also provide a static IP address. I hope that the problem will then be resolved, assuming it really is the changing IP address.
Supermicro M11SDV-4C-LN4F AMD EPYC 3151 4x 2.7GHz RAM 8GB DDR4-2666 SSD 250GB