Issues with High Availability Setups After Update to 25.1

Started by R1mSG, February 12, 2025, 08:34:12 AM

Previous topic - Next topic
Hi,


are you aware of any problems with high availability setups after updating to 25.1 (from 24.7.12_2)?

We have two setups that no longer work after the master firewall was updated to 25.1. The backup firewalls are still on 24.7.12_2.

The issue is that logging into either backup firewall is no longer possible after updating the master to 25.1. The backup WebGUI login page is still accessible, and ping works. It also appears that the backup firewall itself remains correctly configured.

Both password and SSH authentication (keys) are rejected.
System → High Availability → Status displays:
"The backup firewall is not accessible (check user credentials)."

However, this only affects the backup firewall, the master runs without issues in both setups.
So far, we have updated four HA setups to 25.1, two encountered this problem, while two did not.


Regards,
R1mSG

I am running 2 pairs of HA OPNsense since many years, but currently still on 24.7.x, with schedule to upgrade to 25.1.1 in the next few days.

My usual procedure for updating is this (basic steps):
1) Update / Upgrade backup firewall
2) On master firewall in Interfaces → Virtual IP → Status click the "Enter Persistent CARP Maintenance Mode" and then also "Temporarily Disable CARP", this will push the backup to (temporary) master
3) With major upgrades let the backup (temporary master) firewall run for a while (at least half a hour) and test if everything is working as it should, if not go directly to step 5) and then upgrade master later (and if needed reinstall backup with previous version, backup of the config just before upgrade may be helpful)
4) Update / Upgrade master (temporary backup) firewall
5) On master (temporary backup) firewall in Interfaces → Virtual IP → Status click the "Leave Persistent CARP mode" (and if for some reason status may not go into master, then on the backup click the "Temporarily Disable CARP" to give it a push.

We have many HA setups running here.
Updates are distributed automatically as long as there are no errors in the previous FWs.

We've never had this problem before.

I've now also taken a closer look at one, the config looks the same so far.
Logging in was only possible by resetting the root pw in single user mode. But also here, the login was only possible via serial, the gui login did not work.

Why SSH login was no longer possible, well ... the .ssh folder was empty. It must have been deleted by the update for whatever reason.

Maybe it was just bad luck, let's see ...

The access manager (user management) was rewritten:
https://forum.opnsense.org/index.php?topic=45460.0

If you upgrade the master first, and replicate to the backup (e.g., via cronjob) you shoot yourself in your own foot. The new user manager configuration will synchronize with the backup firewall.

Please follow the upgrade instructions of our documentation:

https://docs.opnsense.org/manual/how-tos/carp.html#example-updating-a-carp-ha-cluster
Hardware:
DEC740