CARP not preempting despite "disable preempt" not checked

Started by MaeveFirstborn, November 22, 2024, 11:09:02 PM

Previous topic - Next topic
We have two firewalls in a CARP failover relationship. Each one has two WANs and three LANs. While troubleshooting something earlier today, we realized that CARP failover wasn't behaving how we thought it was supposed to. We want the behavior to be such that if one of the interfaces fails - any of them - the backup takes over. More specifically, whichever one has the most functional interfaces. I guess a better configuration in the future would be to aim for specifically weighing on the WANs, but for now we want to get preempting working in the first place. 
Right now, when we kill one of the interfaces on the master, the second firewall's corresponding interface takes over as CARP master. However, ONLY that interface takes over. Which is useless - if the WANs fail on firewall 1 but the LANs don't, then the downstream hosts are going to send messages to the firewall which has the CARP master - which in this case is the firewall without WAN reachibility. 
Is it something with these advskews?
Obvious stuff:

  • Disable pre-empt is off
  • CARP itself is working, just not in a group

We face exactly the same problem.
if we unplug the LAN from out primary it does a failover to the secondary on LAN and all the assigned VLAN interfaces. But not on the WAN interface - for the WAN the primary stays master and so internet traffic for the clients is interrupted.
I had Disable preemptive unchecked on the primary and checkend on the secondary... this option disappeared from the gui with the update to 25.1 so I had to remove it from the config from  the secondary manually...

What OPNsense version are running and how did you set up the interfaces? It's important that all the interfaces have the same identifier, like VLAN99 - opt1 and so.

See https://docs.opnsense.org/manual/how-tos/carp.html#setup-interfaces-basic-firewall-rules

"Make sure the interface assignments on both systems are identical! Via Interfaces ‣ Overview you can check if e.g. DMZ is opt1 on both machines. When the assignments differ you will have mixed Master and Backup IPs on both machines."
Deciso DEC740

We use OPNsense 25.1.3-amd64 installed on two Proxmox hypervisors. all the interface numberings and assignments are exactly the same on both machines. (the primary/secondary config is generated so the interface numbering must be the same on both machines)
I removed some of the CARP VIPs yesterday because the vlans are not in use currently but I kept the interfaces active on both machines... but this should not matter right?

Quote from: tofuSCHNITZEL on March 19, 2025, 12:22:02 AMWe face exactly the same problem.
if we unplug the LAN from out primary it does a failover to the secondary on LAN and all the assigned VLAN interfaces. But not on the WAN interface - for the WAN the primary stays master and so internet traffic for the clients is interrupted.
I had Disable preemptive unchecked on the primary and checkend on the secondary... this option disappeared from the gui with the update to 25.1 so I had to remove it from the config from  the secondary manually...
The VHIDs have to be uniq, one VHID per CARP interface. You're setting the same VHID for multiple interfaces.
Deciso DEC740

Quote from: MaeveFirstborn on November 22, 2024, 11:09:02 PMIs it something with these advskews?
The recommendation is advskews to 0 or 1 on the master and 100+ on the backups. Try setting them to 0 on the master CARPs and all to 100 on the backups.
Deciso DEC740

Quote from: patient0 on March 19, 2025, 10:23:26 AMThe VHIDs have to be uniq, one VHID per CARP interface. You're setting the same VHID for multiple interfaces.

I have more than 256 vlans so I cannot use different vhids for every vlan. also technically they are "unique" per interface. since the vlans don't "see" each other... i also had a unique VHID for WAN and LAN and then the same for every vlan - but this also did not help with the "together failover"

the advskew is 0 for every VIP on the primary and 100 for every VIP on the secondary

QuoteDisable preemptive unchecked on the primary and checkend on the secondary... this option disappeared from the gui
I'm on 25.1.3 and it's still available, you gotta enable the 'advanced mode', as before, to see it.

Quote from: tofuSCHNITZEL on March 19, 2025, 10:40:50 AMI have more than 256 vlans so I cannot use different vhids for every vlan. also technically they are "unique" per interface. since the vlans don't "see" each other... i also had a unique VHID for WAN and LAN and then the same for every vlan - but this also did not help with the "together failover"
Fair enough, has to be uniq on the interface/broadcast domain.
Just out of curiosity: You have more than 256 interfaces CARP-ed? And each VLAN has it's own interface in the VM (I assume not)? What type of switch/bridge to you use on Proxmox, Linux bridge or OpenvSwitch?
Deciso DEC740

Quote from: patient0 on March 19, 2025, 12:22:31 PMI'm on 25.1.3 and it's still available, you gotta enable the 'advanced mode', as before, to see it.
yes you are right! I forgot about the "advanced" switch on top.

Quote from: patient0 on March 19, 2025, 12:22:31 PMFair enough, has to be uniq on the interface/broadcast domain.
Just out of curiosity: You have more than 256 interfaces CARP-ed? And each VLAN has it's own interface in the VM (I assume not)? What type of switch/bridge to you use on Proxmox, Linux bridge or OpenvSwitch?
currently active no - but planned. and of course I need a carp in every vlan otherwise the HA setup does not make much sense.
no the vlans are added in opnsense on the main "interface". the bridge in proxmox that contains this interface is set to be vlan aware (linux bridge) but we had our fair share of headaches with multicast "bleeding" between vlans etc..

I realised the issue.
It was the virtualisation all along. since the interface is in a bridge and this bridge is assigned to a virtual nic - the nic never went down when the plug on the hypervisor was physically disconnected. so the master opnsense never had the "interface down" event so it never got demoted and I always ended up with a "split brain" situation where both the primary and secondary became master in the LAN interfaces - and since on the wan both could see each other - the wan stayed master/backup.
I solved it by directly assigning the opnsense VMs the nics on the 10g card via PCI-E passthrough.