Home
Help
Search
Login
Register
OPNsense Forum
»
English Forums
»
High availability
»
When should a "failover" automatically occur?
« previous
next »
Print
Pages: [
1
]
Author
Topic: When should a "failover" automatically occur? (Read 2186 times)
j_s
Newbie
Posts: 42
Karma: 7
When should a "failover" automatically occur?
«
on:
July 20, 2022, 01:34:46 pm »
Hello. I've got 3 HA systems in production use, but still one lingering problem seems to remain that I encounter from time to time...
HA and failover.
Let me give an example:
#1 has all interfaces in a MASTER state for CARP and #2 has all interfaces in a BACKUP state. #1 is my primary and #2 is my secondary.
All is working fine, but someone accidentally unplugs a network cable randomly on #1 (I am running lagg everywhere, but sometimes failover occurs faster than the lagg can respond). #2 (which had all carp interfaces in a BACKUP state) has changed all interfaces to MASTER. So far so good. Everything works and workloads are typically unaware of what just happened.
Later that day I reconnect the cable that was unplugged.
#2 remains in MASTER state for all CARP interfaces and #1 stays in BACKUP state for all interfaces (except the one that was offline, that changes to BACKUP from INIT).
I did look at the CARP traffic with tcpdump, and everything seems normal. The carp packets being broadcasted by #2 have the proper vrids, and the priority is 100. #1 isn't broadcasting anything and is happy with all of its CARP states in BACKUP.
So here's my questions:
1. Shouldn't #1 have taken back over as MASTER now that all of the networking is good? (I expected it to, but it didn't)
2. If it's not supposed to, what's the recommended procedure to fail back over to #1 being in MASTER state? I get the feeling that "rebooting #2" isn't the most ideal situation, although I have done it in the past to solve problems like this.
3. Since I expected expected that #1 would have simply broadcasted itself as the higher priority and taken back over, is there a setting I could have wrong?
I am having this problem on 22.1.10, but I've had this happen to me on versions going back to the 21.1 series. I have no reason to believe there's a problem with this version, but more of a problem with either my configuration or a problem with my understanding of the proper behavior of Opnsense HA.
Thanks for reading and hopefully clarifying my misunderstandings of HA or configuration issues.
Logged
Patrick M. Hausen
Hero Member
Posts: 6848
Karma: 575
Re: When should a "failover" automatically occur?
«
Reply #1 on:
July 20, 2022, 01:47:14 pm »
What is System > High Availability > Settings > Disable Preempt set to?
Logged
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do.
(Isaac Asimov)
j_s
Newbie
Posts: 42
Karma: 7
Re: When should a "failover" automatically occur?
«
Reply #2 on:
July 20, 2022, 02:35:24 pm »
Quote from: pmhausen on July 20, 2022, 01:47:14 pm
What is System > High Availability > Settings > Disable Preempt set to?
It is disabled on both nodes. Sorry, I should have mentioned that in my original post.
However, I did go poking around, and after going through every page of the WebGUI, I saw this on #1 under Interfaces -> Virtual IPs -> Status:
CARP has detected a problem and this unit has been demoted to BACKUP status.
Check link status on all interfaces with configured CARP VIPs.
I double checked everything and all is good (each can ping the other and the BACKUP is receiving carp packets and can ping the VIP). So I'm going to presume that it saw a problem when the cable was unplugged, but hasn't figured out that everything is fine now. I verified from #1 and #2 that I can ping the other and that #1 can ping all of the VIPs. I also verified that every interface with a carp (which is all of them except sync interface) is receiving carp packets.
I did noticed while writing this post that on #1 under Interfaces -> Virtual IPs -> Status it says "Current CARP demotion level = 0" while on #2 it says that it is 240. I'm not sure if this is a hint as to the problem or not as I don't know what the "demotion level" really means. About to go check Google and the docs.
Edit: I did just find that an interface I created (but never actually used for any client machines) did not have the firewall rule allowing CARP traffic. I've since added the rule to both #1 and #2.
«
Last Edit: July 20, 2022, 02:52:54 pm by j_s
»
Logged
j_s
Newbie
Posts: 42
Karma: 7
Re: When should a "failover" automatically occur?
«
Reply #3 on:
July 21, 2022, 12:01:31 am »
Just wanted to provide an update. For reasons outside my control, the #2 box had to be power cycled. #1 picked up and took over all workloads. When #2 came up, it sat in a BACKUP state.
I guess this issue will be left unsolved as I can't investigate it more as the issue is gone.
Thanks to everyone that helped.
Logged
nzkiwi68
Full Member
Posts: 182
Karma: 20
Re: When should a "failover" automatically occur?
«
Reply #4 on:
August 12, 2022, 05:43:43 am »
You can also slow down CARP fail-over.
The
Advertising Frequency
base value default is 1. This means a CARP broadcast message is sent once per second. You could slow this down to 2 or 3 and sent CARP messages once every 2 or 3 seconds which will slow down fail-over but be more stable in an imperfect network.
If CARP appears to fail-over too quickly on the smallest network hiccup, then, increasing the
base
value can assist.
Logged
Print
Pages: [
1
]
« previous
next »
OPNsense Forum
»
English Forums
»
High availability
»
When should a "failover" automatically occur?