CARP role doesn't switch properly after updating to 19.1.8

Started by bitmusician, May 23, 2019, 11:17:10 AM

Previous topic - Next topic
July 04, 2019, 10:42:35 AM #15 Last Edit: July 04, 2019, 11:32:10 AM by katamadone [CH]
- currently it looks, like the secondary is every time master not backup as in 19.1.7
- Tried to edit Virtual IP Setting and re-save it
- HAVE TO CONFIRM THAT: both become master on this IP regardless of the advertising Frequency settings

So, you left mnt mode and both were master and had the sysctl from above? Strange. Then go to CLI, do a clog -f /var/log/system.log and post the new lines when leaving mnt mode

Quote from: katamadone [CH] on July 04, 2019, 10:42:35 AM
- currently it looks, like the secondary is every time master not backup as in 19.1.7
- Tried to edit Virtual IP Setting and re-save it
- HAVE TO CONFIRM THAT: both become master on this IP regardless of the advertising Frequency settings

We are facing the same issue right now. In the GUI it looks like both nodes are Master after setting the first node into persistant CARP Maintenance mode but when we reboot this node the other "Master" (the actual Backup node) doesn't answer the Requests to the VIPs.

Is there already a solution?


Quote from: mimugmail on July 06, 2019, 09:54:02 AM
So, you left mnt mode and both were master and had the sysctl from above? Strange. Then go to CLI, do a clog -f /var/log/system.log and post the new lines when leaving mnt mode


node01 (normally the MASTER):
when switching into maintenance mode:


Jul 10 12:53:30 node01 kernel: carp: demoted by 240 to 240 (sysctl)

when switching out of maintenance mode:

Jul 10 12:54:42 node01 kernel: carp: demoted by -240 to 0 (sysctl)

-----------------------------------------------------

node02 (normally the BACKUP):
when switching into maintenance mode:


Jul 10 12:53:30 node02 kernel: carp: 31@igb0: BACKUP -> MASTER (preempting a slower master)
Jul 10 12:53:31 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "xxx.xxx.xxx.xxx - VIP WAN (31@igb0)" has resumed the state "MASTER" for vhid 31
Jul 10 12:53:31 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: Starting OpenVPN server instance on xxx.xxx.xxx.xxx - VIP WAN because of transition to CARP master.
Jul 10 12:53:31 node02 kernel: ovpns1: link state changed to DOWN
Jul 10 12:53:35 node02 kernel: ovpns1: link state changed to UP
Jul 10 12:53:35 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: OpenVPN server 1 instance started on PID 34541.
Jul 10 12:53:36 node02 opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns1'
Jul 10 12:53:36 node02 opnsense: /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.

when switching out of maintenance mode:


Jul 10 12:54:41 node02 kernel: carp: 31@igb0: MASTER -> BACKUP (more frequent advertisement received)
Jul 10 12:54:41 node02 kernel: ifa_maintain_loopback_route: deletion failed for interface igb0: 3
Jul 10 12:54:42 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "xxx.xxx.xxx.xxx - VIP WAN (31@igb0)" has resumed the state "BACKUP" for vhid 31

I had to revert, couldn't leave that in production. Sorry.
I'll have to start over soon and try again.

Quote from: katamadone [CH] on July 11, 2019, 03:48:15 PM
I had to revert, couldn't leave that in production. Sorry.
I'll have to start over soon and try again.

Did you have the problem already in 19.1.8 or only after updating to 19.1.10?

Quote from: bitmusician on July 10, 2019, 03:46:15 PM
Quote from: mimugmail on July 06, 2019, 09:54:02 AM
So, you left mnt mode and both were master and had the sysctl from above? Strange. Then go to CLI, do a clog -f /var/log/system.log and post the new lines when leaving mnt mode


node01 (normally the MASTER):
when switching into maintenance mode:


Jul 10 12:53:30 node01 kernel: carp: demoted by 240 to 240 (sysctl)

when switching out of maintenance mode:

Jul 10 12:54:42 node01 kernel: carp: demoted by -240 to 0 (sysctl)

-----------------------------------------------------

node02 (normally the BACKUP):
when switching into maintenance mode:


Jul 10 12:53:30 node02 kernel: carp: 31@igb0: BACKUP -> MASTER (preempting a slower master)
Jul 10 12:53:31 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "xxx.xxx.xxx.xxx - VIP WAN (31@igb0)" has resumed the state "MASTER" for vhid 31
Jul 10 12:53:31 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: Starting OpenVPN server instance on xxx.xxx.xxx.xxx - VIP WAN because of transition to CARP master.
Jul 10 12:53:31 node02 kernel: ovpns1: link state changed to DOWN
Jul 10 12:53:35 node02 kernel: ovpns1: link state changed to UP
Jul 10 12:53:35 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: OpenVPN server 1 instance started on PID 34541.
Jul 10 12:53:36 node02 opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns1'
Jul 10 12:53:36 node02 opnsense: /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.

when switching out of maintenance mode:


Jul 10 12:54:41 node02 kernel: carp: 31@igb0: MASTER -> BACKUP (more frequent advertisement received)
Jul 10 12:54:41 node02 kernel: ifa_maintain_loopback_route: deletion failed for interface igb0: 3
Jul 10 12:54:42 node02 opnsense: /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "xxx.xxx.xxx.xxx - VIP WAN (31@igb0)" has resumed the state "BACKUP" for vhid 31

I don't get it ... from reading the logs after switching off mnt mode second machine should be backup???

Quote from: mimugmail on July 12, 2019, 07:22:23 AM
I don't get it ... from reading the logs after switching off mnt mode second machine should be backup???

Yes when switching it off its Backup again. But when I turn on maintenance mode on the first node in the GUI both are shown as Master and nobody answers the requests to the VIP.

Quote from: bitmusician on July 12, 2019, 08:13:41 AM
Quote from: mimugmail on July 12, 2019, 07:22:23 AM
I don't get it ... from reading the logs after switching off mnt mode second machine should be backup???

Yes when switching it off its Backup again. But when I turn on maintenance mode on the first node in the GUI both are shown as Master and nobody answers the requests to the VIP.

And when you revert to 19.1.7 it's working again?

Quote from: bitmusician on July 12, 2019, 07:10:27 AM
Quote from: katamadone [CH] on July 11, 2019, 03:48:15 PM
I had to revert, couldn't leave that in production. Sorry.
I'll have to start over soon and try again.

Did you have the problem already in 19.1.8 or only after updating to 19.1.10?

had the problem already from 19.1.7 -> 19.1.8

I tend to try it again. I'm not so happy that I'm on OPNsense 19.1.7.

@mimugmail do you have some tips what I should check

#1
******************************************************
  NOMAINTENANCE   PRIMARY
******************************************************
sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
******************************************************
  NOMAINTENANCE   SECONDARY AFTER BOOT
******************************************************
sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
#2
******************************************************
  MAINTENANCE   PRIMARY
******************************************************
<6>carp: 1@vmx2_vlan605: MASTER -> INIT (hardware interface up)
<6>carp: 1@vmx2_vlan605: INIT -> BACKUP (initialization complete)
<6>carp: 2@vmx2_vlan621: MASTER -> INIT (hardware interface up)
<6>carp: 2@vmx2_vlan621: INIT -> BACKUP (initialization complete)
<6>carp: 3@vmx2_vlan622: MASTER -> INIT (hardware interface up)
<6>carp: 3@vmx2_vlan622: INIT -> BACKUP (initialization complete)
<6>carp: 4@vmx2_vlan623: MASTER -> INIT (hardware interface up)
<6>carp: 4@vmx2_vlan623: INIT -> BACKUP (initialization complete)
<6>carp: 5@vmx2_vlan624: MASTER -> INIT (hardware interface up)
<6>carp: 5@vmx2_vlan624: INIT -> BACKUP (initialization complete)
<6>carp: 6@vmx2_vlan625: MASTER -> INIT (hardware interface up)
<6>carp: 6@vmx2_vlan625: INIT -> BACKUP (initialization complete)
<6>carp: 7@vmx2_vlan626: MASTER -> INIT (hardware interface up)
<6>carp: 7@vmx2_vlan626: INIT -> BACKUP (initialization complete)
<6>carp: 8@vmx2_vlan627: MASTER -> INIT (hardware interface up)
<6>carp: 8@vmx2_vlan627: INIT -> BACKUP (initialization complete)
<6>carp: 9@vmx2_vlan628: MASTER -> INIT (hardware interface up)
<6>carp: 9@vmx2_vlan628: INIT -> BACKUP (initialization complete)
<6>carp: 11@vmx2_vlan630: MASTER -> INIT (hardware interface up)
<6>carp: 11@vmx2_vlan630: INIT -> BACKUP (initialization complete)
<6>carp: 13@vmx2_vlan606: MASTER -> INIT (hardware interface up)
<6>carp: 13@vmx2_vlan606: INIT -> BACKUP (initialization complete)
<6>carp: 14@vmx2_vlan611: MASTER -> INIT (hardware interface up)
<6>carp: 14@vmx2_vlan611: INIT -> BACKUP (initialization complete)
<6>carp: 15@vmx2_vlan602: MASTER -> INIT (hardware interface up)
<6>carp: 15@vmx2_vlan602: INIT -> BACKUP (initialization complete)
<6>carp: 16@vmx2_vlan107: MASTER -> INIT (hardware interface up)
<6>carp: 16@vmx2_vlan107: INIT -> BACKUP (initialization complete)
<6>carp: 17@vmx2_vlan682: MASTER -> INIT (hardware interface up)
<6>carp: 17@vmx2_vlan682: INIT -> BACKUP (initialization complete)
<6>carp: 20@vmx2_vlan607: MASTER -> INIT (hardware interface up)
<6>carp: 20@vmx2_vlan607: INIT -> BACKUP (initialization complete)
<6>carp: 21@vmx0: MASTER -> INIT (hardware interface up)
<6>carp: 21@vmx0: INIT -> BACKUP (initialization complete)
<6>carp: 23@vmx1: MASTER -> INIT (hardware interface up)
<6>carp: 23@vmx1: INIT -> BACKUP (initialization complete)
<6>carp: 25@vmx2_vlan631: MASTER -> INIT (hardware interface up)
<6>carp: 25@vmx2_vlan631: INIT -> BACKUP (initialization complete)
<6>carp: 26@vmx2_vlan632: MASTER -> INIT (hardware interface up)
<6>carp: 26@vmx2_vlan632: INIT -> BACKUP (initialization complete)
<6>carp: 27@vmx2_vlan700: MASTER -> INIT (hardware interface up)
<6>carp: 27@vmx2_vlan700: INIT -> BACKUP (initialization complete)
<6>carp: 28@vmx2_vlan701: MASTER -> INIT (hardware interface up)
<6>carp: 28@vmx2_vlan701: INIT -> BACKUP (initialization complete)
<6>carp: 29@vmx2_vlan702: MASTER -> INIT (hardware interface up)
<6>carp: 29@vmx2_vlan702: INIT -> BACKUP (initialization complete)
<6>carp: 30@vmx2_vlan703: MASTER -> INIT (hardware interface up)
<6>carp: 30@vmx2_vlan703: INIT -> BACKUP (initialization complete)
<6>carp: 31@vmx2_vlan704: MASTER -> INIT (hardware interface up)
<6>carp: 31@vmx2_vlan704: INIT -> BACKUP (initialization complete)
<6>carp: 32@vmx2_vlan705: MASTER -> INIT (hardware interface up)
<6>carp: 32@vmx2_vlan705: INIT -> BACKUP (initialization complete)
<6>carp: 33@vmx2_vlan703: MASTER -> INIT (hardware interface up)
<6>carp: 33@vmx2_vlan703: INIT -> BACKUP (initialization complete)
<6>carp: 34@vmx2_vlan704: MASTER -> INIT (hardware interface up)
<6>carp: 34@vmx2_vlan704: INIT -> BACKUP (initialization complete)
<6>carp: 35@vmx2_vlan705: MASTER -> INIT (hardware interface up)
<6>carp: 35@vmx2_vlan705: INIT -> BACKUP (initialization complete)
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
******************************************************
  NOMAINTENANCE   Secondary
******************************************************
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240

ENTERING & LEAVING Maintenance works
BUT    I'm very confused on the demotion on the master --> shouldn't it be 240?? (Last time it was)

***BOOTED***
******************************************************
  MAINTENANCE   PRIMARY  (after boot)
******************************************************
<6>carp: 30@vmx2_vlan703: INIT -> BACKUP (initialization complete)
<6>carp: 31@vmx2_vlan704: INIT -> BACKUP (initialization complete)
<6>carp: 32@vmx2_vlan705: INIT -> BACKUP (initialization complete)
<6>carp: 33@vmx2_vlan703: INIT -> BACKUP (initialization complete)
<6>carp: 34@vmx2_vlan704: INIT -> BACKUP (initialization complete)
<6>carp: 35@vmx2_vlan705: INIT -> BACKUP (initialization complete)
<6>carp: demoted by 240 to 240 (pfsync bulk start)
<6>carp: demoted by -240 to 0 (pfsync bulk done)
<118>>>> Invoking start script 'carp'
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240

------> LEAVE Perstistent
Primary stays backup on all VLANs

#3

******************************************************
  NOMAINTENANCE   PRIMARY
******************************************************
<118>>>> Invoking start script 'carp'
<6>carp: demoted by 0 to 0 (sysctl)
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240

******************************************************
  NOMAINTENANCE   Secondary
******************************************************
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240



Getting Back to the VMWARE snapshot with 19.1.7 firewall is switching back properly to MASTER

Things to mention:
- 29 CARP entries on separate VLANS
- Both Firewalls are on VSphere (ESXi) 6.5
- There's a separate Interface for pfSYNC
- And CARP traffic is allowed via Floating Rule (first match)





some other logs:
https://pastebin.com/rSCZGWy2

I'm confused by the (master timed out) messages, but all seems working as expected at this moment.

Are you sure you pushed button "Enter persitent maintainence mode" (would generate an other message like slower advertisements)?
For me it seems you pushed "Disable CARP" (master timeout).