Print Page - CARP OS-FRR timeout after upgrade to rel 25.7.10

Title: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 12, 2026, 10:59:33 AM

Hello,

After updating several devices from 24.7.12 to 25.7.10, the following error occurs with the os-frr plugin:

After failover to the slave, it takes approximately 2 minutes until the connection to the endpoints via WireGuard and OPVPN is restored. Oddly, the IPsec tunnels are not affected. Without activating the os-frr plugin, everything works perfectly. Simply activating os-frr is enough to trigger the error; BGP doesn't even need to be enabled.

The same problem occurs when reverting to the master server.

According to the log:

After BACKUP -> MASTER, os-frr (zebra) starts, and then there's an error with configd with a timeout of approximately 2 minutes. After that, the remaining Carp interfaces are activated in /usr/local/etc/rc.syshook.d/carp/20-openvpn.

What could be causing this error? I haven't found anything relevant in the log!

Hardware used: Deciso

Logs:

2026-01-12T09:00:14
Notice
opnsense
/usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "CARP WAN FW PORT 102 (185.120.61.102) (102@ax1)" has resumed the state "MASTER" for vhid 102
2026-01-12T09:00:14
Error
configctl
error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out
2026-01-12T08:58:15
Notice
watchfrr
[KWE5Q-QNGFC] all daemons up, doing startup-complete notify
2026-01-12T08:58:15
Notice
watchfrr
[QDG3Y-BY5TN] zebra state -> up : connect succeeded
2026-01-12T08:58:15
Notice
watchfrr
[QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
2026-01-12T08:58:15
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : openvpn_refresh_crls(1))
2026-01-12T08:58:15
Notice
watchfrr
[T83RR-8SM5G] watchfrr 10.5.0 starting: vty@0
2026-01-12T08:58:14
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : core_trust_crl(1))
2026-01-12T08:58:14
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (1)
2026-01-12T08:58:14
Notice
kernel
<6>[144370] carp: 110@vlan02: BACKUP -> MASTER (preempting a slower master)

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 12, 2026, 11:46:03 AM

What would be a minimum configuration to reproduce?

os-frr enabled? with or without the Carp Failover activated?
At least one wireguard tunnel? Also with Depend on CARP activated?

Then the symptom is that the wireguard tunnel takes 2 minutes to failover?

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 12, 2026, 01:31:10 PM

Wireguard and OPNVPN Legacy Depend on CARP activated also OS-FRR

more Facts :

( pairs : Master:Slave )

Tested on various devices with CARP same behavior

1 pair : without activate frr Failover okay .

ipsec side to side tunnel
OPNVPN Legacy Side to Side Client
Wireguard Site to Side tunnel

1 pair : activate only frr Failover time out .

ipsec side to side tunnel : no time out
OPNVPN Legacy Side to Side Client : timeout
Wireguard Site to Side tunnel : timeout

In OPNVPN Legacy, it's very clear that when there's a connection status, all information about the tunnels is missing.

after the time out ( File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out)

Then you can see the information and you can also ping the remote

Wireguard Status after 2 min you can ping the remote

** 2 pair **

2 pair : without activate frr Failover okay .

ipsec side to side tunnel
OPNVPN Instance Server Side to Side TAP Brige L2 (move for test the tunnel from leagcy to Instance / see comment below ****** )
Wireguard Site to Side tunnel

2 pair : activate only frr Failover time out .

ipsec side to side tunnel: no time out
OPNVPN Instance Server Side to Side TAP Brige L2 time out (move for test the tunnel from leagcy to Instance )
Wireguard Site to Side tunnel time out

same error in the logs : ( File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out)

I have the problem with 16 Pairs ( Master:Slave ) ; I have performed a rollback to 24.7.12 for all, but 2 pairs for further investigation runs 25.7.10.

*******
OPNVPN Instance TAP L2 brige (without FRR)

After switching the OPVN tunnel (server) from legacy to instance TAP L2 with interface and bridge, the failover only works partially. After switching to slave, no connection is established, even after a longer waiting time. It's not possible to connect to the deactivated master, but if you kill it on the master, you can see that the client reconnects to the slave. Even when the master is activated, this doesn't always work immediately.

In Legacy runs without any trouble

*********

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 12, 2026, 02:00:42 PM

Can you be precise with this:

24.7.12 to 25.7.10, there are two major upgrades here (24.7 -> 25.1 -> 25.7).

If that is really true, its very hard to find the exact version where it stopped to work.

To bisect this, you can do incremental updates by going to:
- "System - Firmware - Settings"
- enable "advanced mode"
- Flavour "(custom)"

Code Select

25.7/MINT/25.7.x/latest
Here slowly increment the versions.

25.1/MINT/25.1.1/latest
25.1/MINT/25.1.2/latest
...

You don't need every minor upgrade, just try to bisect where it happens, that would help a lot.

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 12, 2026, 02:49:16 PM

Okay, I understand.

I need to find a time slot where I can downgrade to 24.7.12. and after this step by step to the higher ver.
Unfortunately, some changes have already been made to the configuration, as changes were also made to the remote site.
I'll get back to you with more information;

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 14, 2026, 08:35:22 AM

Short info,

config:

ipsec legacy side to side tunnel
OPVPN legacy side to side tunnel TAP L2 bridge
Wireguard side to side tunnel

os-frr activate bgb not activate

migrate from 24.7.12 to 25.1.1 fail over behavior okay no error message

migrate from 25.1.1 to 25.1.4 fail over behavior okay no error message

migrate from 25.1.4 to 25.1.12 fail over behavior okay no error message

next step go to 25.7.1

One more question: how many intermediate steps should I take starting on 25.7.x

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 14, 2026, 08:40:06 AM

Just go all the way to the last available minor update, if you don't have an issue continue, if you have an issue roll back and go half the distance. That's how I bisect if there are issues.

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 14, 2026, 09:07:46 AM

okay now

After updating from 25.1.12 to 25.7.1, ( i update only the Slave for better rollback ) the previously described timeout error occurs, as can be clearly seen here.

2026-01-14T08:58:08
Notice
opnsense
/usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "CARP Vlan_206 (10.10.21.4) (110@vlan02)" has resumed the state "MASTER" for vhid 110
2026-01-14T08:58:08
Error
configctl
error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out
2026-01-14T08:56:09
Notice
watchfrr
[KWE5Q-QNGFC] all daemons up, doing startup-complete notify
2026-01-14T08:56:09
Notice
watchfrr
[QDG3Y-BY5TN] zebra state -> up : connect succeeded
2026-01-14T08:56:09
Notice
watchfrr
[QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
2026-01-14T08:56:08
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : openvpn_refresh_crls(1))
2026-01-14T08:56:08
Notice
watchfrr
[T83RR-8SM5G] watchfrr 10.4 starting: vty@0
2026-01-14T08:56:08
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (execute task : core_trust_crl(1))
2026-01-14T08:56:08
Notice
opnsense
/usr/local/sbin/pluginctl: plugins_configure crl (1)
2026-01-14T08:56:08
Notice
kernel
<6>[433] carp: 100@ax1: BACKUP -> MASTER (preempting a slower master)

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 14, 2026, 03:08:40 PM

That would be the exact point where the switch from frr 8 to frr 10 was made:

https://forum.opnsense.org/index.php?topic=48072.0
https://github.com/opnsense/plugins/blob/3af383e1e05b3a6831f7ed1f3d75ed0b17a77756/net/frr/pkg-descr#L45-L51

Unsure what could be the cause though, this has been productive in CARP setups for a while now, I know of no other current open issues.

It could be a rare specific issue that exists in your configuration (aka having multiple VPN implementations activated and depending on CARP at the same time, combined with the dynamic routing plugin even if BGP is not activated).

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 14, 2026, 04:08:24 PM

I need the BGP; I only mentioned it because it had no effect with or without BGP. I was trying to narrow down the error that way.

How do we proceed from here, and will there be a solution?

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 14, 2026, 04:24:11 PM

I need the exact configd call that timed out.

Can you search for that in the ssh shell via:

Code Select

opnsense-log configd

after triggering that issue?

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 15, 2026, 03:03:48 PM

It looks like on your affected device this configd call stalls:

Code Select

request ifconfig
It's this action: https://github.com/opnsense/core/blob/55f34d8feb7a1b2b9af1e24ed46e6029fdaf3455/src/opnsense/service/conf/actions.d/actions_interface.conf#L95

Can you execute this manually?

Code Select

configctl interface list ifconfig
If this hangs also try a normal ifconfig:

Code Select

ifconfig
Try with the frr plugin enabled, and disabled, see if it makes a difference.

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 15, 2026, 04:53:00 PM

configctl interface list ifconfig has worked

no change in behavior

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 19, 2026, 10:30:42 AM

We will continue looking into this when 26.1 is out, because if its somehow fixed there, we don't need to chase it right now.

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 20, 2026, 10:53:19 AM

Thanks for the info, then we'll wait for version 26.x

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 22, 2026, 05:05:46 PM

Hello, I think we found something.

https://github.com/opnsense/plugins/pull/5160

Can you try the following patch on the affected firewalls, it will only apply to the latest FRR version though (which means you have to be on >25.7.10 when you test).

# opnsense-patch https://github.com/opnsense/plugins/commit/d27619990739424db4e0aaa266c2392eeb7abe57

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: Monviech (Cedrik) on January 26, 2026, 11:01:55 AM

This patch will be in 26.1:

https://github.com/opnsense/plugins/commit/d2024adcdcef47df3915305ee1013d6a2f81d0ca

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: rkam on January 26, 2026, 11:20:54 AM

I have now tested the patch on version 27.7.10 with different Deciso models, and the error no longer occurs.
I will then test it on 27.7.11_2.
Thanks again for your support.

Title: Re: CARP OS-FRR timeout after upgrade to rel 25.7.10
Post by: franco on January 26, 2026, 09:03:56 PM

Wasn't it this one? https://github.com/opnsense/plugins/commit/2cc2215bb

If so we're hotfixing this for the last update of 25.7.11_x shortly after 26.1 is out this week.

Cheers,
Franco

OPNsense Forum

English Forums => High availability => Topic started by: rkam on January 12, 2026, 10:59:33 AM