HA + CARP + WireGuard + BGP – Different Failback Behaviour between 24.7 and 26.1

Started by ahmed.dabouni, Today at 01:59:58 PM

Previous topic - Next topic
Hello Community,

we are running two sites with identical logical HA designs but different OPNsense versions, and we are observing different behaviour during CARP failback.

For simplicity, we call them Site A and Site B.

Site A runs OPNsense 26.1.x (FreeBSD 14.3-RELEASE-p8).
Site B runs OPNsense 24.7.x (FreeBSD 14.1-RELEASE-p6).

Both sites use the same architecture:

  • 2 firewalls per site (HA cluster)
  • CARP for VIP handling
  • pfsync for state synchronization
  • 2 WireGuard interfaces per firewall
  • BGP running over WireGuard for routing
  • Disable routes enabled in WG instances
  • Depend on CARP (WAN VIP) configured

The design goal is full redundancy:
If one firewall or one WireGuard tunnel fails, Site-to-Site connectivity remains intact and BGP reconverges.

At Site B (24.7) everything works as expected.
If we run a continuous ping from Site B to Site A and force a CARP failover, we lose 2 packets and the same ping process continues seamlessly. Failback to the original MASTER also works without interruption. It appears as if the states are properly migrated to the active node.

At Site A (26.1.x) the behaviour is different.
When we run a continuous ping from Site A to Site B:

  • First failover (FW1 → FW2): 2 packets lost, ping continues.
  • However, the ICMP state appears to exist on both nodes (more like copy than migration).
  • The real issue happens during failback (FW2 → FW1).

After failback, the existing ping stops working. New pings work immediately, but the already established ICMP session does not recover. When inspecting the state on the restored MASTER, we see that the connection attempts to route out via the public WAN interface and gets NATed instead of using the internal WireGuard interface as expected. As soon as we delete that single state, the original ping continues normally.

So the problem only affects:

  • Existing states
  • That survive failover
  • And then failback

New connections after failback work without issues.

This behaviour is reproducible and only happens on Site A (26.1 / FreeBSD 14.3). Site B (24.7 / FreeBSD 14.1) behaves correctly in both failover and failback scenarios.

We would like to understand whether:

  • pf/pfsync state handling has changed between 24.7 and 26.1
  • This is related to routing re-evaluation during CARP transitions
  • Or if this indicates a regression in state synchronization during failback

Has anyone observed similar behaviour with HA + WireGuard + BGP on 26.1?

Any guidance or insights would be greatly appreciated.

Thank you.