HA + CARP + WireGuard + BGP – Different Failback Behaviour between 24.7 and 26.1

ahmed.dabouni

Hello Community,

we are running two sites with identical logical HA designs but different OPNsense versions, and we are observing different behaviour during CARP failback.

For simplicity, we call them Site A and Site B.

Site A runs OPNsense 26.1.x (FreeBSD 14.3-RELEASE-p8).
Site B runs OPNsense 24.7.x (FreeBSD 14.1-RELEASE-p6).

Both sites use the same architecture:

2 firewalls per site (HA cluster)
CARP for VIP handling
pfsync for state synchronization
2 WireGuard interfaces per firewall
BGP running over WireGuard for routing
Disable routes enabled in WG instances
Depend on CARP (WAN VIP) configured

The design goal is full redundancy:
If one firewall or one WireGuard tunnel fails, Site-to-Site connectivity remains intact and BGP reconverges.

At Site B (24.7) everything works as expected.
If we run a continuous ping from Site B to Site A and force a CARP failover, we lose 2 packets and the same ping process continues seamlessly. Failback to the original MASTER also works without interruption. It appears as if the states are properly migrated to the active node.

At Site A (26.1.x) the behaviour is different.
When we run a continuous ping from Site A to Site B:

First failover (FW1 → FW2): 2 packets lost, ping continues.
However, the ICMP state appears to exist on both nodes (more like copy than migration).
The real issue happens during failback (FW2 → FW1).

After failback, the existing ping stops working. New pings work immediately, but the already established ICMP session does not recover. When inspecting the state on the restored MASTER, we see that the connection attempts to route out via the public WAN interface and gets NATed instead of using the internal WireGuard interface as expected. As soon as we delete that single state, the original ping continues normally.

So the problem only affects:

Existing states
That survive failover
And then failback

New connections after failback work without issues.

This behaviour is reproducible and only happens on Site A (26.1 / FreeBSD 14.3). Site B (24.7 / FreeBSD 14.1) behaves correctly in both failover and failback scenarios.

We would like to understand whether:

pf/pfsync state handling has changed between 24.7 and 26.1
This is related to routing re-evaluation during CARP transitions
Or if this indicates a regression in state synchronization during failback

Has anyone observed similar behaviour with HA + WireGuard + BGP on 26.1?

Any guidance or insights would be greatly appreciated.

Thank you.

HA + CARP + WireGuard + BGP – Different Failback Behaviour between 24.7 and 26.1

ahmed.dabouni

Today at 01:59:58 PM