CARP role doesn't switch properly after updating to 19.1.8

Started by bitmusician, May 23, 2019, 11:17:10 AM

Previous topic - Next topic
Hi,
as we updated and tested the functionality of our test cluster from version 19.1.6 to 19.1.8 and after that our productive cluster from 19.1.4 to 19.1.8, we noticed that there is a little problem with switching the CARP roles. After we finished updating both nodes in each cluster we wanted to know, if the role switching behavior works as before (when MASTER is set into maintenance mode he becomes BACKUP). So node 1 (MASTER) went into maintenance mode but unfortunately stayed Master for this Cluster. Deactivating CARP on this node and activating it again didn't make it work. The only thing that helped us having a normal switching behavior again when one of the nodes is set into maintenance mode was changing the skew of the advertising frequency of one VIP on the MASTER node from 0 to 1 and then back from 1 to 0 again.

Maybe this workaround helps somebody with the same problem.

Greeetz,
bitmusician

There was a change in 19.1.8 indeed, was it one time or is it reproducable?

As I wrote i firstly noticed the problem on our test cluster (which is not a copy of the productive system) and then on our productive cluster too. It should be reproducable on any cluster after updating to 19.1.8 .

I tested this successfully in dev, maybe you have configures some tunables manually?
Check here the details:
https://github.com/opnsense/core/issues/3163

We didn't make any changes in the tunables and we also did not have packet loss.
Since I did the workaround we don't have the problem anymore.

The likely candidate is actually https://github.com/opnsense/core/commit/c5d6b6cacf but it would indicate you are relying on policy routing for a CARP setup which really shouldn't have it (best to use a dedicated CARP link).

# opnsense-patch c5d6b6cacf


Cheers,
Franco

I had exactly the same symptoms, including that disabling CARP and reenabling did not help. Pfsync bulk was successful and skew got to 0 but it did not become master again.
Instead of the workaround I just rebooted it and it became master again.

Hi,

I can confrim the issue exists. We're experiencing this behaviour on both of our Production-Clusters since upgrading to 19.1.8. I tried setting our secondary to "persistent carp maintenance mode", which usually makes the primary node master again, but this also failed. I'll reboot the secondary after work, to make it slave again.

Cheers,
Wayne

Quote from: Wayne Train on May 27, 2019, 10:50:53 AM
Hi,

I can confrim the issue exists. We're experiencing this behaviour on both of our Production-Clusters since upgrading to 19.1.8. I tried setting our secondary to "persistent carp maintenance mode", which usually makes the primary node master again, but this also failed. I'll reboot the secondary after work, to make it slave again.

Cheers,
Wayne

How many carp IPs do you have and which type?
My test cluster has 2 VIPs, both static (LAN, WAN), I can successfully switch forth and back.

My setup has 3 VIP (WAN, LAN, DMZ) + an Alias IP on WAN and has Problems with switching between Firewalls.

Please, on machine 1 and 2 a "sysctl -a | grep carp", before, and after turning into mnt mode, and then when back.
On my side it looks good:

root@OPNsense1:~ # sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
root@OPNsense1:~ # sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 240
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
root@OPNsense1:~ # sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240






root@OPNsense2:~ # sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
root@OPNsense2:~ # sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
root@OPNsense2:~ # sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240

Hi,
we use 11 CARP-VIPs., one for each VLAN.
Cheers,
Wayne

Quote from: Wayne Train on May 27, 2019, 04:56:18 PM
Hi,
we use 11 CARP-VIPs., one for each VLAN.
Cheers,
Wayne

sysctl like above from you too please. Can't track this down without debugging ...

July 03, 2019, 09:35:39 PM #13 Last Edit: July 04, 2019, 10:42:54 AM by katamadone [CH]

July 03, 2019, 09:38:05 PM #14 Last Edit: July 04, 2019, 10:29:59 AM by katamadone [CH]
@mimugmail as I interpret you're looking at the *primary*
net.inet.carp.demotion: 240

so it should be the same at my side, as on your side. Did you check if Master / Backup was correctly display in the webui?