OPNsense Forum

Archive => 19.1 Legacy Series => Topic started by: Andreas_ on May 03, 2019, 09:25:38 am

Title: CARP over LAGG problems
Post by: Andreas_ on May 03, 2019, 09:25:38 am
I usually do my opnsense upgrades by first updating the usually-backup machine, disabling carp on the master and updating it as well.
Now when upgrading from 19.1.2 to 19.1.6 (which needs reboot), I found that some VHIDs would go to master and some to backup (net.inet.carp.preempt=0, should be 1 but helpful for debugging here) afterwards. The VHIDs that became master all are on a LAGG interface (directly or VLAN), the others remaining on backup are on physical interfaces. When disabling and enabling carp on the master machine, the situation was resolved. Apparently, the LAGG interface didn't receive carp packets from the master in-time when booting up, so the rebooted machine suspected it needed to become master itself.

After my HA setup was settled and working normally, I started to upgrade the switches one by one. With one switch down, the LAGG interface is still workable, since only one of both physical interfaces looses connection, but CARP seems to increase demotion based on the physical interface, not the resulting LAGG interface. In order to not have CARP failing over unnecessarily (which would affect eg. OpenVPN connections), CARP on the backup needs to be disabled temporarily.

So there seem to be two issues here: CARP expecting traffic before LAGG is ready, and CARP demotion reacting to LAGG slave interfaces instead of the LAGG interface itself.


Title: Re: CARP over LAGG problems
Post by: mimugmail on May 03, 2019, 11:02:23 am
Now when upgrading from 19.1.2 to 19.1.6 (which needs reboot), I found that some VHIDs would go to master and some to backup (net.inet.carp.preempt=0, should be 1 but helpful for debugging here) afterwards.

Setting to 0 will result in the described situation, so it's ok. It really should be 1.

After my HA setup was settled and working normally, I started to upgrade the switches one by one. With one switch down, the LAGG interface is still workable, since only one of both physical interfaces looses connection, but CARP seems to increase demotion based on the physical interface, not the resulting LAGG interface. In order to not have CARP failing over unnecessarily (which would affect eg. OpenVPN connections), CARP on the backup needs to be disabled temporarily.

Isn't the backup unit also on the same switch? Then it should not fail over ...
Title: Re: CARP over LAGG problems
Post by: Andreas_ on May 03, 2019, 05:15:52 pm
Isn't the backup unit also on the same switch? Then it should not fail over ...

That's exactly the problem. Via the second switch both FWs have degraded but fully functional LAGG connectivity. CARP shouldn't react to LAGG degradation, but it does.
Title: Re: CARP over LAGG problems
Post by: mimugmail on May 03, 2019, 07:34:48 pm
Have you searched FreeBSD bugzilla for similar things? Bitte sure If this is a known issue
Title: Re: CARP over LAGG problems
Post by: olgeni on May 14, 2019, 11:54:02 pm
That's exactly the problem. Via the second switch both FWs have degraded but fully functional LAGG connectivity. CARP shouldn't react to LAGG degradation, but it does.

You can try to add the tunable "net.inet.carp.senderr_demotion_factor" and set it to 0 - that's what fixed my CARP over LAGG.
Title: Re: CARP over LAGG problems
Post by: Andreas_ on July 05, 2019, 04:03:23 pm
current settings are:

net.inet.carp.senderr_demotion_factor =0
net.inet.carp.ifdown_demotion_factor = 120
net.pfsync.carp_demotion_factor = 0