Home
Help
Search
Login
Register
OPNsense Forum
»
Archive
»
17.1 Legacy Series
»
CARP Bug in 17.1 resulting in split brains or backup always "master" ???
« previous
next »
Print
Pages: [
1
]
2
Author
Topic: CARP Bug in 17.1 resulting in split brains or backup always "master" ??? (Read 16278 times)
Wayne Train
Full Member
Posts: 194
Karma: 12
CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
on:
June 27, 2017, 09:59:46 am »
Hi,
I'm experiencing very strange issue resulting in various splitbrains.
In most of the times, only WAN is switched over to the backup node.
When I try to resolve the splitbrain, I manually set the BACKUP-node to CARP MAINTENANCE MODE
and the MASTER holds all interfaces again. The strange thing is, that when I leave Maintenance Mode
on BACKUP, the BACKUP-node takes over the MASTER-role again.
Furthermore, after rebooting or after a failover, the BACKUP-Node remains
in the master-role, while the original MASTER is demoted to the backup-role.
I'm running a LACP-LAGG that consists of igb0 and igb1, that holds a couple of vlans.
My Switch is also configured to use LACP for the trunk.
Each VLAN is configured like this:
MASTER-Node Virtual-IP
10.x.x.10 10.x.x.1/24 vhid 12 , freq. 1 / 0
10.x.y.10 10.x.y.1/24 vhid 24 , freq. 1 / 0
BACKUP-Node Virtual-IP
10.x.x.20 10.x.x.1/24 vhid 12 , freq. 1 / 100
10.x.y.20 10.x.y.1/24 vhid 14 , freq. 1 / 100
When I'm capturing carp-packets I see the following on the LAN-Side:
Capture output of the MASTER-Node:
09:09:53.869797 IP 10.x.x.20 > 224.0.0.18: VRRPv2, Advertisement, vrid 14, prio 100, authtype none, intvl 1s, length 36
09:09:55.282945 IP 10.x.x.20 > 224.0.0.18: VRRPv2, Advertisement, vrid 14, prio 100, authtype none, intvl 1s, length 36
09:09:56.696995 IP 10.x.x.20 > 224.0.0.18: VRRPv2, Advertisement, vrid 14, prio 100, authtype none, intvl 1s, length 36
Capture output of the BACKUP-Node:
09:08:30.688149 IP 10.x.x.20 > 224.0.0.18: VRRPv2, Advertisement, vrid 14, prio 100, authtype none, intvl 1s, length 36
09:08:32.116865 IP 10.x.x.20 > 224.0.0.18: VRRPv2, Advertisement, vrid 14, prio 100, authtype none, intvl 1s, length 36
09:08:33.508241 IP 10.x.x.20 > 224.0.0.18: VRRPv2, Advertisement, vrid 14, prio 100, authtype none, intvl 1s, length 36
On the WAN-Side it looks like this:
Capture output of the MASTER-Node:
09:11:38.102897 IP WAN_BACKUP_NODE_IP > 224.0.0.18: VRRPv2, Advertisement, vrid 12, prio 100, authtype none, intvl 1s, length 36
09:11:39.504055 IP WAN_BACKUP_NODE_IP > 224.0.0.18: VRRPv2, Advertisement, vrid 12, prio 100, authtype none, intvl 1s, length 36
09:11:40.929161 IP WAN_BACKUP_NODE_IP > 224.0.0.18: VRRPv2, Advertisement, vrid 12, prio 100, authtype none, intvl 1s, length 36
Capture output of the BACKUP-Node:
09:13:43.619491 IP WAN_BACKUP_NODE_IP > 224.0.0.18: VRRPv2, Advertisement, vrid 12, prio 100, authtype none, intvl 1s, length 36
09:13:45.039772 IP WAN_BACKUP_NODE_IP > 224.0.0.18: VRRPv2, Advertisement, vrid 12, prio 100, authtype none, intvl 1s, length 36
09:13:46.431278 IP WAN_BACKUP_NODE_IP > 224.0.0.18: VRRPv2, Advertisement, vrid 12, prio 100, authtype none, intvl 1s, length 36
Every Interface & VLAN has a rule to allow any traffic between the CARP-Nodes:
Action Proto Source Port Destination Port Gateway
Pass IPv4 * CARP_NODES_VLAN_X * CARP_NODES_VLAN_X * *
My "High Availability Settings" are configured like this:
MASTER (172.x.y.y = Sync-Interface-IP)
Synchronize States YES
Synchronize Interface SYNC-Interface
Synchronize Peer IP 172.x.y.z
Synchronize Config to IP 172.x.y.z
Remote System Username user_name
Remote System Password password
Users and Groups YES
... YES
DNS Resolver YES
BACKUP (172.x.y.z = Sync-Interface-IP)
Synchronize States YES
Synchronize Interface SYNC-Interface
Synchronize Peer IP 172.x.y.y
I left all other Settings unchecked, since the help tells, that one should only sync
from the MASTER to the BACKUP node and not bi-directional. So I assume this is right.
Or am I wrong ?
In My logs I can only find the following entries:
Jun 23 19:03:21 kernel: carp: 12@lagg0_vlan40: MASTER -> BACKUP (more frequent advertisement received)
Jun 23 19:03:21 kernel: carp: 17@lagg0_vlan100: MASTER -> BACKUP (more frequent advertisement received)
Jun 23 19:03:21 kernel: carp: 19@lagg0_vlan20: MASTER -> BACKUP (more frequent advertisement received)
Jun 23 19:03:21 kernel: carp: 16@lagg0_vlan70: MASTER -> BACKUP (more frequent advertisement received)
Jun 23 19:03:21 kernel: carp: 15@lagg0_vlan60: MASTER -> BACKUP (more frequent advertisement received)
Jun 23 19:03:20 kernel: carp: 20@lagg0_vlan10: MASTER -> BACKUP (more frequent advertisement received)
To me everything seems like the BACKUP-node is advertising more frequent than the original MASTER and therefore becomes the master.
I also checked the settings on the shell to see, if there is some valuable information regarding carp. As you can see on the MASTER,
it got demoted:
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 3120
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
While on the BACKUP-node it looks like this:
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240
Another strange thing is, that by invoking "ifconfig", all my vlans are in the carp group "groups: vlan",
while on my WAN-interface "igb5" no carp group is defined. May this be the reason for the split brains?
In some way this would explain, why the VLANs and WAN failover seperately. In a correctly working
HA-enviroment, i would expect the master to failover completely to the backup, if any of it's interfaces
goes down...
I'm experiencing this issue on 17.1.1, 17.1.4 and 17.1.8 and I really ran out of ideas on how to resolve it.
Is it possible that this is a bug in freebsd carp, or opnsense release?
Is someone experiencing similar issues?
Best regards,
Wayne
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #1 on:
June 27, 2017, 10:20:01 am »
Have you tried this setup without LAGG to isolate the problem?
I'd first setup the whole thing without VLANs and without LAGG. If this works as expected I'd add VLANs. If this works as expected I'd add LAGG.
Then you'll see where exactly the error is.
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
Wayne Train
Full Member
Posts: 194
Karma: 12
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #2 on:
June 27, 2017, 02:26:54 pm »
I already did this, and I tried it again with a completely blank setting a few minutes ago.
The result is:
With only a physical Interface one on the LAN one on the WAN side, everything works well and I got no split brains.
With 1 VLAN (not on a LACP-LAGG, neither a LAGG), one physical NIC on the LAN and one on the WAN side, it results in split brains again. The Backup node takes over the VLAN if I manually failover by disconnectng the cable from the used port, but it fails over only for that interface. LAN and WAN reside on the original MASTER.
Furthermore, when I attach the cable back in, the BACKUP node doesn't release the IP back to the master.
I'm on Release 17.1.4 at the moment.
Best Regards
Wayne
Logged
Wayne Train
Full Member
Posts: 194
Karma: 12
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #3 on:
June 27, 2017, 02:27:42 pm »
At the moment it all looks like that there are some strange vlan issues that affect carps behaviour.
Logged
Wayne Train
Full Member
Posts: 194
Karma: 12
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #4 on:
June 27, 2017, 04:17:49 pm »
Ok,
I just upgraded to release 17.1.8, but the problem remains. Any ideas ?
Cheers
Wayne
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #5 on:
June 27, 2017, 04:32:14 pm »
Just to isolate further, can you check LAGG without VLANs?
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
Wayne Train
Full Member
Posts: 194
Karma: 12
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #6 on:
June 28, 2017, 02:17:24 pm »
It's the same behaviour.
Do you yourself also have carp enabled with vlans on the lan-side ?
Regards,
Wayne
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #7 on:
June 28, 2017, 03:08:52 pm »
Not, but I'll investigate time here to reproduce if the error is clear.
So LAGG without VLANs works fine? No splitbrains?
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
Wayne Train
Full Member
Posts: 194
Karma: 12
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #8 on:
June 30, 2017, 08:32:27 am »
Hi,
no it didn't. And furthermore there seems to be another Bug: After trying with the LAGG, I wanted to delete it, and the whole system crashed. I had this before on both nodes before I did a clean reinstall. OPNsense detected a bug and i filed it with a short description. It was related to some errors and uncaught exceptions in the lagg_edit.php file, but I'm not a programmer...
I'm really hoping, that the next minor relase is coming soon, since 17.1.8 isn't really what I expected from OPNsense. 16.x was really fine, I had no issues. Until 17.1.4 everything worked fine and then it started getting really weird...
Thank you.
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #9 on:
June 30, 2017, 09:50:17 am »
Today I'm in home office, I'll try to reproduce this on monday with some test machines.
So
- CARP with single interfaces works
- CARP with single interfaces as VLANs results in split-brain
- CARP with LAGG without VLANs results in split-brain
- CARP with LAGG with VLANs results in split-brain
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
pingutux
Newbie
Posts: 1
Karma: 0
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #10 on:
July 07, 2017, 01:57:15 pm »
Hello,
i can confirm this.
-> CARP with single interfaces as VLANs results in split-brain
Started with 17.1.8.
br
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #11 on:
July 11, 2017, 10:23:04 am »
WAN VLAN (igb0)
LAN ETH (igb1)
CARP on VLAN
Works, no splitbrains.
I'll try VLAN only with just one physical IF in the next test
EDIT: There was a short mac flap of course:
*Mar 1 01:58:12.181: %SW_MATM-4-MACFLAP_NOTIF: Host 0000.5e00.0101 in vlan 60 is flapping between port Gi2/0/13 and port Gi2/0/14
*Mar 1 01:58:23.431: %SW_MATM-4-MACFLAP_NOTIF: Host 0000.5e00.0101 in vlan 60 is flapping between port Gi2/0/13 and port Gi2/0/14
«
Last Edit: July 11, 2017, 10:41:59 am by mimugmail
»
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #12 on:
July 11, 2017, 10:56:46 am »
VLAN60 / WAN / igb0 / CARP IP 192.168.10.1
VLAN99 / WAN / igb0 / CARP IP 192.168.1.1
pulled cable if igb0 on unit 1, unit 2 took over smoothly. Pluged in again, I had 2 mac flaps and a loss of 5 pings.
No splitbrain.
17.1.8
How is you switch configured?
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
Wayne Train
Full Member
Posts: 194
Karma: 12
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #13 on:
July 11, 2017, 03:05:44 pm »
Hi,
my trunking LAGG to the switch is configured as LACP. Both on the Firewall- and on the Switch-Side.
Flowcontrol is enabled. Otherwise LACP won't work like intended. But btw: I haven't experienced these issues in 16.7.
Therefore I expect, that it's related to a bug in 17.1.x.
This is my setup:
Switch Firewall
47 igb0
( )======(VLANs 10-100)========( )==(OPNSENSE)=====(WAN)
48 igb1
V-IPs for VLAN 10-100 V-IP for WAN
I wonder if it's an issue that only occurs if you have multiple VLANs on one LAGG.
Have you also tried this ?
I experienced the issue on multiple systems. All of them 17.1.x.
Best regards
Wayne
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: CARP Bug in 17.1 resulting in split brains or backup always "master" ???
«
Reply #14 on:
July 11, 2017, 04:41:32 pm »
Hi,
I only tested VLAN, not LAGG, I can do this tomorrow.
Don't know why flowcontrol should influence LACP. This woud mean that you can't run this setup without a switch supporting flowcontrol?
You said you also experienced splitbrains with just VLANs and not LAGG?
Are you trunking Vlan1 (like in Catalyst)?
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
Print
Pages: [
1
]
2
« previous
next »
OPNsense Forum
»
Archive
»
17.1 Legacy Series
»
CARP Bug in 17.1 resulting in split brains or backup always "master" ???