BGP (FRR) drops all LAN routes when adding WAN Virtual IP (CARP) - HA Cluster

Started by l.ansaloni, February 18, 2026, 04:28:09 PM

Previous topic - Next topic
Subject: BGP (FRR) drops all LAN routes when adding WAN Virtual IP (CARP) - HA Cluster

Hi everyone,

I am facing a critical issue with my OPNsense HA cluster where adding or removing a Virtual IP (Alias/CARP) on the WAN interface causes the entire BGP (FRR) routing table on the LAN side to be dropped/flushed, causing downtime for several minutes.

My Environment:
  • Setup: 2x OPNsense instances in High Availability (Master/Slave).
  • BGP Plugin: os-frr (BGP) enabled.
  • Backend: A Kubernetes cluster using MetalLB in BGP mode.
  • Logic:
    • MetalLB advertises internal private IPs (e.g., 192.168.9.x/32) via BGP to the OPNsense LAN/VLAN interfaces.
    • OPNsense learns these routes and knows exactly which K8s node to send traffic to.
    • I own a public /22 range. I manually assign specific Public IPs from this range as Virtual IP Aliases on the OPNsense WAN.
    • I use Port Forward (NAT) to map the Public WAN IP to the Private BGP-learned IP.

The Problem:
Whenever I need to add or remove a Public IP from the WAN interface (following the standard CARP procedure: disable CARP on secondary -> add VIP -> add on primary -> re-enable CARP on secondary), the moment I Apply Changes on the primary unit:

  • The BGP table is completely flushed.
  • The sessions with the K8s neighbors (LAN side) seem to flap or restart.
  • It takes 3 to 5 minutes for the routes to be relearned and the traffic to flow again.

Since the WAN VIPs and the LAN BGP sessions are on completely different interfaces, I wouldn't expect a change on the WAN to trigger a full re-initialization of the FRR routing table or LAN-side sessions.

Logs:
I have captured the logs during the event. It seems the FRR service is being stopped/restarted completely.
Notice the frr_carp: no frr deamons active and the transition from BGP_Stop to BGP_Start.

2026-02-18T15:52:09 Error bgpd [H4B4J-DCW2R][EC 33554455] 10.21.1.14 [Error] bgp_read_packet error: Connection reset by peer
...
2026-02-18T15:51:53 Error bgpd [H4B4J-DCW2R][EC 33554455] 10.21.1.11 [Error] bgp_read_packet error: Connection reset by peer
...
2026-02-18T15:49:53 Error frr_carp no frr deamons active.
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.21.1.16 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: No AFI/SAFI activated for peer
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.21.1.15 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: No AFI/SAFI activated for peer
...
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.20.1.13 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: Update source change

Configuration Details:
  • AS OPNsense: 64512 / AS K8s: 64514.
  • BGP Neighbors configured with: Next-Hop-Self, Multi-Hop (nodes are on a different VLAN), and BFD.
  • The issue happens exactly when the "Interface/VIP" configuration is reloaded by the OS.

Questions:
  • Is it expected behavior for FRR to restart or drop routes when any interface configuration (even an unrelated WAN Alias) is modified?
  • Looking at the log frr_carp: no frr deamons active., it implies the CARP hook script might be forcing a restart or finding the service dead. Is there a way to prevent this for WAN-only changes?
  • Is there a way to make the FRR process "immune" to interface reloads that don't involve the BGP-facing interfaces?

I need to be able to manage my Public IP pool without taking down the internal routing for the whole cluster. Any advice is welcome!