Subject: BGP (FRR) drops all LAN routes when adding WAN Virtual IP (CARP) - HA Cluster
Hi everyone,
I am facing a critical issue with my OPNsense HA cluster where adding or removing a Virtual IP (Alias/CARP) on the WAN interface causes the entire BGP (FRR) routing table on the LAN side to be dropped/flushed, causing downtime for several minutes.
My Environment:- Setup: 2x OPNsense instances in High Availability (Master/Slave).
- BGP Plugin: os-frr (BGP) enabled.
- Backend: A Kubernetes cluster using MetalLB in BGP mode.
- Logic:
- MetalLB advertises internal private IPs (e.g., 192.168.9.x/32) via BGP to the OPNsense LAN/VLAN interfaces.
- OPNsense learns these routes and knows exactly which K8s node to send traffic to.
- I own a public /22 range. I manually assign specific Public IPs from this range as Virtual IP Aliases on the OPNsense WAN.
- I use Port Forward (NAT) to map the Public WAN IP to the Private BGP-learned IP.
The Problem:Whenever I need to add or remove a Public IP from the WAN interface (following the standard CARP procedure (https://docs.opnsense.org/manual/how-tos/carp.html#example-adding-a-virtual-ip-to-an-active-vhid-group): disable CARP on secondary -> add VIP -> add on primary -> re-enable CARP on secondary), the moment I
Apply Changes on the primary unit:
- The BGP table is completely flushed.
- The sessions with the K8s neighbors (LAN side) seem to flap or restart.
- It takes 3 to 5 minutes for the routes to be relearned and the traffic to flow again.
Since the WAN VIPs and the LAN BGP sessions are on completely different interfaces, I wouldn't expect a change on the WAN to trigger a full re-initialization of the FRR routing table or LAN-side sessions.
Logs:I have captured the logs during the event. It seems the FRR service is being stopped/restarted completely.
Notice the frr_carp: no frr deamons active and the transition from BGP_Stop to BGP_Start.
2026-02-18T15:52:09 Error bgpd [H4B4J-DCW2R][EC 33554455] 10.21.1.14 [Error] bgp_read_packet error: Connection reset by peer
...
2026-02-18T15:51:53 Error bgpd [H4B4J-DCW2R][EC 33554455] 10.21.1.11 [Error] bgp_read_packet error: Connection reset by peer
...
2026-02-18T15:49:53 Error frr_carp no frr deamons active.
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.21.1.16 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: No AFI/SAFI activated for peer
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.21.1.15 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: No AFI/SAFI activated for peer
...
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.20.1.13 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: Update source change
Configuration Details:- AS OPNsense: 64512 / AS K8s: 64514.
- BGP Neighbors configured with: Next-Hop-Self, Multi-Hop (nodes are on a different VLAN), and BFD.
- The issue happens exactly when the "Interface/VIP" configuration is reloaded by the OS.
Questions:- Is it expected behavior for FRR to restart or drop routes when any interface configuration (even an unrelated WAN Alias) is modified?
- Looking at the log frr_carp: no frr deamons active., it implies the CARP hook script might be forcing a restart or finding the service dead. Is there a way to prevent this for WAN-only changes?
- Is there a way to make the FRR process "immune" to interface reloads that don't involve the BGP-facing interfaces?
I need to be able to manage my Public IP pool without taking down the internal routing for the whole cluster. Any advice is welcome!