Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - l.ansaloni

#1
Hi everyone,

I am facing a critical issue with my OPNsense HA cluster where adding or removing a Virtual IP (Alias/CARP) on the WAN interface causes the entire BGP (FRR) routing table on the LAN side to be dropped/flushed, causing downtime for several minutes.

My Environment:
  • Setup: 2x OPNsense instances in High Availability (Master/Slave).
  • BGP Plugin: os-frr (BGP) enabled.
  • Backend: A Kubernetes cluster using MetalLB in BGP mode.
  • Logic:
    • MetalLB advertises internal private IPs (e.g., 192.168.9.x/32) via BGP to the OPNsense LAN/VLAN interfaces.
    • OPNsense learns these routes and knows exactly which K8s node to send traffic to.
    • I own a public /22 range. I manually assign specific Public IPs from this range as Virtual IP Aliases on the OPNsense WAN.
    • I use Port Forward (NAT) to map the Public WAN IP to the Private BGP-learned IP.

The Problem:
Whenever I need to add or remove a Public IP from the WAN interface (following the standard CARP procedure: disable CARP on secondary -> add VIP -> add on primary -> re-enable CARP on secondary), the moment I Apply Changes on the primary unit:

  • The BGP table is completely flushed.
  • The sessions with the K8s neighbors (LAN side) seem to flap or restart.
  • It takes 3 to 5 minutes for the routes to be relearned and the traffic to flow again.

Since the WAN VIPs and the LAN BGP sessions are on completely different interfaces, I wouldn't expect a change on the WAN to trigger a full re-initialization of the FRR routing table or LAN-side sessions.

Logs:
I have captured the logs during the event. It seems the FRR service is being stopped/restarted completely.
Notice the frr_carp: no frr deamons active and the transition from BGP_Stop to BGP_Start.

2026-02-18T15:52:09 Error bgpd [H4B4J-DCW2R][EC 33554455] 10.21.1.14 [Error] bgp_read_packet error: Connection reset by peer
...
2026-02-18T15:51:53 Error bgpd [H4B4J-DCW2R][EC 33554455] 10.21.1.11 [Error] bgp_read_packet error: Connection reset by peer
...
2026-02-18T15:49:53 Error frr_carp no frr deamons active.
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.21.1.16 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: No AFI/SAFI activated for peer
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.21.1.15 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: No AFI/SAFI activated for peer
...
2026-02-18T15:49:53 Error bgpd [J9K4Q-T8STY][EC 33554466] 10.20.1.13 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, (null), fd -1, last reset: Update source change

Configuration Details:
  • AS OPNsense: 64512 / AS K8s: 64514.
  • BGP Neighbors configured with: Next-Hop-Self, Multi-Hop (nodes are on a different VLAN), and BFD.
  • The issue happens exactly when the "Interface/VIP" configuration is reloaded by the OS.

Questions:
  • Is it expected behavior for FRR to restart or drop routes when any interface configuration (even an unrelated WAN Alias) is modified?
  • Looking at the log frr_carp: no frr deamons active., it implies the CARP hook script might be forcing a restart or finding the service dead. Is there a way to prevent this for WAN-only changes?
  • Is there a way to make the FRR process "immune" to interface reloads that don't involve the BGP-facing interfaces?

I need to be able to manage my Public IP pool without taking down the internal routing for the whole cluster. Any advice is welcome!
#2
I update to versione 21.1:

OPNsense 21.1-amd64
FreeBSD 12.1-RELEASE-p12-HBSD
OpenSSL 1.1.1i 8 Dec 2020


but the problem persist.
#3
20.7 Legacy Series / LDAP users can't login from GUI
February 26, 2021, 08:54:51 AM
I use the version :
OPNsense 20.1.8_1-amd64
FreeBSD 11.2-RELEASE-p20-HBSD
LibreSSL 3.0.2

I have setup the LDAP server for opnsense Web GUI login by using the step from documentation:
https://docs.opnsense.org/manual/how-tos/user-ldap.html
with Read properties and Synchronize groups option active in LDAP server setting.

User import to local users database with success.
User assign to local admins group with success.
From the console:
root@firewall:~ # cat /etc/group
...
admins:*:1999:root,l.ansaloni
...


When I test the user authentication in System\Access\Tester, everything find and no error.
I got the result message:
This user is a member of these groups:
admins

When I try to login in the Web GUI, I loop in the login page and the user has being kick out of the admins group.
from the console:
root@firewall:~ # cat /etc/group
...
admins:*:1999:root
...


Do anyone have the same problem?
#4
20.1 Legacy Series / Re: Web GUI ldap users login error
February 26, 2021, 08:45:51 AM
I have the same problem with version:
OPNsense 20.7.8_4-amd64
FreeBSD 12.1-RELEASE-p12-HBSD
OpenSSL 1.1.1i 8 Dec 2020

I add to what dleung01 said from the console:
root@firewall:~ # cat /etc/group
...
admins:*:1999:root,DomainAdmins
...


When I add the user l.ansaloni to the admin group I see this:
root@firewall:~ # cat /etc/group
...
admins:*:1999:root,DomainAdmins,l.ansaloni
...


If I try to log in with the l.ansaloni user, I loop in the login page and the user has being kick out of the admins group:
root@firewall:~ # cat /etc/group
...
admins:*:1999:root,DomainAdmins
...

#5
18.1 Legacy Series / Re: Multi Wan- switching GW
February 08, 2018, 04:47:16 PM
Hi,
I confirm the same behavior.
Lorenzo