CARP OS-FRR timeout after upgrade to rel 25.7.10

rkam · January 12, 2026, 10:59:33 AM

Hello,

After updating several devices from 24.7.12 to 25.7.10, the following error occurs with the os-frr plugin:

After failover to the slave, it takes approximately 2 minutes until the connection to the endpoints via WireGuard and OPVPN is restored. Oddly, the IPsec tunnels are not affected. Without activating the os-frr plugin, everything works perfectly. Simply activating os-frr is enough to trigger the error; BGP doesn't even need to be enabled.

The same problem occurs when reverting to the master server.

According to the log:

After BACKUP -> MASTER, os-frr (zebra) starts, and then there's an error with configd with a timeout of approximately 2 minutes. After that, the remaining Carp interfaces are activated in /usr/local/etc/rc.syshook.d/carp/20-openvpn.

What could be causing this error? I haven't found anything relevant in the log!

Hardware used: Deciso

Logs:

2026-01-12T09:00:14
Notice
opnsense
/usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "CARP WAN FW PORT 102 (185.120.61.102) (102@ax1)" has resumed the state "MASTER" for vhid 102
2026-01-12T09:00:14
Error
configctl
error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out
2026-01-12T08:58:15
Notice
watchfrr
[KWE5Q-QNGFC] all daemons up, doing startup-complete notify
2026-01-12T08:58:15
Notice
watchfrr
[QDG3Y-BY5TN] zebra state -> up : connect succeeded
2026-01-12T08:58:15
Notice
watchfrr
[QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
2026-01-12T08:58:15
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : openvpn_refresh_crls(1))
2026-01-12T08:58:15
Notice
watchfrr
[T83RR-8SM5G] watchfrr 10.5.0 starting: vty@0
2026-01-12T08:58:14
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : core_trust_crl(1))
2026-01-12T08:58:14
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (1)
2026-01-12T08:58:14
Notice
kernel
<6>[144370] carp: 110@vlan02: BACKUP -> MASTER (preempting a slower master)

Monviech (Cedrik) · January 12, 2026, 11:46:03 AM

What would be a minimum configuration to reproduce?

os-frr enabled? with or without the Carp Failover activated?
At least one wireguard tunnel? Also with Depend on CARP activated?

Then the symptom is that the wireguard tunnel takes 2 minutes to failover?

rkam · January 12, 2026, 01:31:10 PM

Wireguard and OPNVPN Legacy Depend on CARP activated also OS-FRR

more Facts :

( pairs : Master:Slave )

Tested on various devices with CARP same behavior

1 pair : without activate frr Failover okay .

ipsec side to side tunnel
OPNVPN Legacy Side to Side Client
Wireguard Site to Side tunnel

1 pair : activate only frr Failover time out .

ipsec side to side tunnel : no time out
OPNVPN Legacy Side to Side Client : timeout
Wireguard Site to Side tunnel : timeout

In OPNVPN Legacy, it's very clear that when there's a connection status, all information about the tunnels is missing.

after the time out ( File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out)

Then you can see the information and you can also ping the remote

Wireguard Status after 2 min you can ping the remote

** 2 pair **

2 pair : without activate frr Failover okay .

ipsec side to side tunnel
OPNVPN Instance Server Side to Side TAP Brige L2 (move for test the tunnel from leagcy to Instance / see comment below ****** )
Wireguard Site to Side tunnel

2 pair : activate only frr Failover time out .

ipsec side to side tunnel: no time out
OPNVPN Instance Server Side to Side TAP Brige L2 time out (move for test the tunnel from leagcy to Instance )
Wireguard Site to Side tunnel time out

same error in the logs : ( File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out)

I have the problem with 16 Pairs ( Master:Slave ) ; I have performed a rollback to 24.7.12 for all, but 2 pairs for further investigation runs 25.7.10.

*******
OPNVPN Instance TAP L2 brige (without FRR)

After switching the OPVN tunnel (server) from legacy to instance TAP L2 with interface and bridge, the failover only works partially. After switching to slave, no connection is established, even after a longer waiting time. It's not possible to connect to the deactivated master, but if you kill it on the master, you can see that the client reconnects to the slave. Even when the master is activated, this doesn't always work immediately.

In Legacy runs without any trouble

*********

Monviech (Cedrik) · January 12, 2026, 02:00:42 PM

Can you be precise with this:

24.7.12 to 25.7.10, there are two major upgrades here (24.7 -> 25.1 -> 25.7).

If that is really true, its very hard to find the exact version where it stopped to work.

To bisect this, you can do incremental updates by going to:
- "System - Firmware - Settings"
- enable "advanced mode"
- Flavour "(custom)"

Code Select

25.7/MINT/25.7.x/latest
Here slowly increment the versions.

25.1/MINT/25.1.1/latest
25.1/MINT/25.1.2/latest
...

You don't need every minor upgrade, just try to bisect where it happens, that would help a lot.

rkam · January 12, 2026, 02:49:16 PM

Okay, I understand.

I need to find a time slot where I can downgrade to 24.7.12. and after this step by step to the higher ver.
Unfortunately, some changes have already been made to the configuration, as changes were also made to the remote site.
I'll get back to you with more information;

rkam · January 14, 2026, 08:35:22 AM

Short info,

config:

ipsec legacy side to side tunnel
OPVPN legacy side to side tunnel TAP L2 bridge
Wireguard side to side tunnel

os-frr activate bgb not activate

migrate from 24.7.12 to 25.1.1 fail over behavior okay no error message

migrate from 25.1.1 to 25.1.4 fail over behavior okay no error message

migrate from 25.1.4 to 25.1.12 fail over behavior okay no error message

next step go to 25.7.1

One more question: how many intermediate steps should I take starting on 25.7.x

Monviech (Cedrik) · January 14, 2026, 08:40:06 AM

Just go all the way to the last available minor update, if you don't have an issue continue, if you have an issue roll back and go half the distance. That's how I bisect if there are issues.

rkam · January 14, 2026, 09:07:46 AM

okay now

After updating from 25.1.12 to 25.7.1, ( i update only the Slave for better rollback ) the previously described timeout error occurs, as can be clearly seen here.

2026-01-14T08:58:08
Notice
opnsense
/usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "CARP Vlan_206 (10.10.21.4) (110@vlan02)" has resumed the state "MASTER" for vhid 110
2026-01-14T08:58:08
Error
configctl
error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out
2026-01-14T08:56:09
Notice
watchfrr
[KWE5Q-QNGFC] all daemons up, doing startup-complete notify
2026-01-14T08:56:09
Notice
watchfrr
[QDG3Y-BY5TN] zebra state -> up : connect succeeded
2026-01-14T08:56:09
Notice
watchfrr
[QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
2026-01-14T08:56:08
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : openvpn_refresh_crls(1))
2026-01-14T08:56:08
Notice
watchfrr
[T83RR-8SM5G] watchfrr 10.4 starting: vty@0
2026-01-14T08:56:08
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : core_trust_crl(1))
2026-01-14T08:56:08
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (1)
2026-01-14T08:56:08
Notice
kernel
<6>[433] carp: 100@ax1: BACKUP -> MASTER (preempting a slower master)

Monviech (Cedrik) · January 14, 2026, 03:08:40 PM

That would be the exact point where the switch from frr 8 to frr 10 was made:

https://forum.opnsense.org/index.php?topic=48072.0
https://github.com/opnsense/plugins/blob/3af383e1e05b3a6831f7ed1f3d75ed0b17a77756/net/frr/pkg-descr#L45-L51

Unsure what could be the cause though, this has been productive in CARP setups for a while now, I know of no other current open issues.

It could be a rare specific issue that exists in your configuration (aka having multiple VPN implementations activated and depending on CARP at the same time, combined with the dynamic routing plugin even if BGP is not activated).

rkam · January 14, 2026, 04:08:24 PM

I need the BGP; I only mentioned it because it had no effect with or without BGP. I was trying to narrow down the error that way.

How do we proceed from here, and will there be a solution?

Monviech (Cedrik) · January 14, 2026, 04:24:11 PM

I need the exact configd call that timed out.

Can you search for that in the ssh shell via:

Code Select

opnsense-log configd

after triggering that issue?

Monviech (Cedrik) · January 15, 2026, 03:03:48 PM

It looks like on your affected device this configd call stalls:

Code Select

request ifconfig
It's this action: https://github.com/opnsense/core/blob/55f34d8feb7a1b2b9af1e24ed46e6029fdaf3455/src/opnsense/service/conf/actions.d/actions_interface.conf#L95

Can you execute this manually?

Code Select

configctl interface list ifconfig
If this hangs also try a normal ifconfig:

Code Select

ifconfig
Try with the frr plugin enabled, and disabled, see if it makes a difference.

rkam · January 15, 2026, 04:53:00 PM

configctl interface list ifconfig has worked

no change in behavior

Monviech (Cedrik) · January 19, 2026, 10:30:42 AM

We will continue looking into this when 26.1 is out, because if its somehow fixed there, we don't need to chase it right now.

rkam · January 20, 2026, 10:53:19 AM

Thanks for the info, then we'll wait for version 26.x

CARP OS-FRR timeout after upgrade to rel 25.7.10

rkam

January 12, 2026, 10:59:33 AM

Monviech (Cedrik)

January 12, 2026, 11:46:03 AM #1

rkam

January 12, 2026, 01:31:10 PM #2

Monviech (Cedrik)

January 12, 2026, 02:00:42 PM #3

rkam

January 12, 2026, 02:49:16 PM #4

rkam

January 14, 2026, 08:35:22 AM #5 Last Edit: January 14, 2026, 08:38:35 AM by rkam

Monviech (Cedrik)

January 14, 2026, 08:40:06 AM #6

rkam

January 14, 2026, 09:07:46 AM #7

Monviech (Cedrik)

January 14, 2026, 03:08:40 PM #8 Last Edit: January 14, 2026, 03:10:22 PM by Monviech (Cedrik)

rkam

January 14, 2026, 04:08:24 PM #9

Monviech (Cedrik)

January 14, 2026, 04:24:11 PM #10

Monviech (Cedrik)

January 15, 2026, 03:03:48 PM #11

rkam

January 15, 2026, 04:53:00 PM #12

Monviech (Cedrik)

January 19, 2026, 10:30:42 AM #13

rkam

January 20, 2026, 10:53:19 AM #14