Problems when enabling "Synchronize States"

Started by petersen, March 18, 2024, 03:12:29 PM

Previous topic - Next topic
Hello,

We would like to use OPNsense with High Availability, but keep running into the following problem during setup.

We are using two identical hardware systems with OPNsense version 24.1.3_1.

The following sources were used as instructions:
- https://docs.opnsense.org/manual/how-tos/carp.html
- https://www.thomas-krenn.com/en/wiki/OPNsense_HA_Cluster_configuration (it's a German website)
- https://www.youtube.com/watch?v=I5n3QXOlxmw

Up to the step "Setup pfSync and HA sync (xmlrpc)" everything works without any problems.

The firewalls communicate with each other.
I can send a ping to 1.1.1.1 and get a response.
I can switch off one firewall and the other firewall takes over immediately.
Everything works as it should.

However, as soon as I check the "Synchronize States" checkbox under "System > High Availability > Settings", it no longer works.
Under "System > High Availability > Status" I get the message "The backup firewall is not accessible or not configured" after waiting a while.
The ping to 1.1.1.1 is lost if the master firewall is not available.

As soon as I remove the tick from the "Synchronize States" checkbox, it works again without any problems.
Firewall 2 takes over if Firewall 1 is not available and vice versa.


I have configured the corresponding interfaces on both firewalls.
I have created the rules for both the sync interface with "Allow all" on both firewalls, as well as a rule for the CARP protocol on the WAN and LAN interface.
I have created the corresponding VIPs on both firewalls.
I have created NAT on both firewalls.


Which settings am I overlooking?

Thank you for any help! If any further information is needed, I will try to provide it.

Use tcpdump to trace the packets on the HA link between both systems. That should give you a hint about what exactly is failing.

Anything special about the HA link? Is it a dedicated interface? Is it just a patch cable or is a switch involved? Are you using the default multicast address for pfsync?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Hi Patrick,

thank you for your answer.

The HA link is on a dedicated interface with a cable directly between the two firewalls. I am using the address of the other firewall for pfsync.

Unfortunately I'm still learning how to use and interpret tcpdump, but maybe you can help?
I have put the packet capture from the master firewall in the attachments from the moment when I enable "Synchronize States".

A screenshot of sync state section on both firewalls would be great. Also the log when applying

Hi mimugmail,

i have attached 3 files. Two screenshots of the sync state section and one of the log when applying.

Or do you want a different log? If yes, please specify what log you wanna see.

Thank you for your help!

When "Informational" instead of "Notice" there is nothing more on master and backup?
Also a "dmesg -a" via CLI of both systems would be good

There is nothing more. There is nothing in the informational log.

The result of the "dmesg -a" is in the attached screenshot.


Hi mimugmail,

i have removed the lagg and now use a single direct cable connection between the two firewalls but still have the same problem  :(

Might be a long shot, but under Firewall >> Rules >> [Interface], do you have a rule to allow "IPv4 CARP" traffic from the "[Interface] net" to any port, any destination, any gateway, on any schedule?

If so, what happens if you create that same rule but under Firewall >> Rules >> Floating and check it for all interfaces other than your WAN interface (because you really don't want CARP traffic from the internet)?

When I saw this, it turned out I wasn't thinking in the right perspective on how the firewalls were communicating the Synchronization statuses, and after doing the floating rule and it all of a sudden worked like a charm, I deleted it, started creating individual rules for each interface (the copy/clone button is a godsend), and then disabling them one by one until I figure out what was going on.

Hope this helps!

Hi anomaly0617,

i have double and triple checked the firewall rules. I allow all IPv4 CARP traffic on all interfaces. On the pfSync Interface I have the rule that allows all traffic.

I have these rules on both firewalls but I still have the same problem :(

Hi,

It looks like I've solved the problem.

Yesterday I tried to set up High Availability on another hardware machine to rule out a hardware problem. After booting OPNsense from a live stick, I was able to set up High Availability and it worked.

The difference: The live stick is running version 23.1.
The other system was running 23.1.3.1.

So I reinstalled OPNsense and set up High Availability. But again the same problem. But possible on the system with a live stick. So the problem must be with the hard disk?
So I completely wiped the hard disk with an external tool and reinstalled OPNsense. Again the same problem...
But this time I had a working config from the live stick. So I imported it, restarted the system and it works?

Why? What am I missing?


At least I can now continue to test the functions and operation of High Availability.

Thanks to those who tried to help and the input!

Sorry to bring this up again,
we have the same issue here. State-Sync enabled on master and slave brings "split-brain" after some days. Disable state-sync system is smooth as butter.....

We are using Unicast-VIP but this issue exists even before 24.10_7 with multicast....