HA The backup firewall is not accessible (check user credentials)

Started by Fauconjeff, April 17, 2025, 07:20:29 PM

Previous topic - Next topic
Hi, I have problem with HA sync between my 25.1.5_5 firewalls. When i go to System-HA-Status, it hangs there for 30 secs, then say Check user credentials.

I know the credentials is good.
Listen interfaces is set to all
There is no port in the Synchronize Config box. Just IP
Both Replication interface are connected to a dedicated port with a cross-over cable (no switch)
I have a firewall rule to allow * because it's directly connected.
When i open the Firewall Live view on a second browser tab, i can't see any traffic when i click on Status (in HA section where the error appear)

But, if i connect to the primary FW through SSH and issue the command: /usr/local/etc/rc.filter_synchronize there is no error, sync is working, and i can see Replication traffic on the Firewall Live View.

Looks like something is broken on the web interface.

I have another pair of OpnSense with the same version, same HA config and it's working fine.

I also have 2 HA setups, and in both this works just fine as it always did.

I did just re-check and I see that the status is only available on the primary system. But on the backup systems the messages "The backup firewall is not accessible (check user credentials)." is displayed. But this makes sense, as on the backup system in the Settings everything below "Configuration Synchronization Settings (XMLRPC Sync)" is empty.

I have the same issue on 2 business edition firewalls. They were upgraded to 25.4 last week and since then the primary firewall displays "backup firewall unavailable (check user creds)". HA sync worked prior to the upgrade - I did one the same day, before upgrading. I have manually reset all passwords on both firewalls and it's made no difference, and as per the OP the rc.filter_synchronize script from the command line works fine.

tcpdump on either firewall (tcpdump -i igb0 port 443) shows no packets when clicking System -> High Availability -> Status, however plenty of PFSYNC traffic is visible when running tcpdump without the port specification. I can curl https://ha-peer-ip-address/ with no issues from the CLI.

I may have to clarify my wording, as I suspect that there the issue may be.

In the HA setups I run, one system is clearly the primary (and most of the time the CARP master), and the other one the secondary (the CARP backup).
Only during system updates (when a reboot is needed), I do switch the CARP master to the secondary system for the update of the primary system. After the update I switch back.

Depending on your setup the CARP master may currently be on the secondary system (from the HA point of view), so it would make sense that you are not able to see the status.

Hope this helps.

I am still having this issue, and I have delved a bit deeper. I can see that when the /ui/core/hasync_status page is loaded, it makes an API call to /api/core/hasync_status/version. On my firewalls (2 separate pairs in different networks) this API endpoint responds:

{
  "status": "error",
  "message": "parse error. not well formed"
}

I have seen reference to this error in old bugs around HA sync, for example
https://forum.opnsense.org/index.php?topic=20557.0
https://github.com/opnsense/core/issues/4533

A different bug but similar symptoms. I think it means that the IXR library cannot parse the config file, as in, it is not well formed XML. However unticking all of the different services to synchronise in the HA Settings page does not make a difference to me. Either way, the error message displayed in the web GUI is clearly unrelated to the error returned by the API, and is quite misleading!

How can I work out what bit of the config/HA setup IXR is saying is not well formed?

I never had to dig into any HA issue so far and the two setups have been done a long time ago and I don't remember the potential issues I may have run into and fixed.

So just some more things to (re-)check:
- Did you use the /24 netmask for the PFSYNC interface?
- Is this /24 subnet not used anywhere else in your internal network, e.g. may have routes elsewhere?

And below my settings for both systems (primary and secondary).

In Systems / High Availability / Settings on the primary I have the following settings (activate "advanced mode"):
General Settings:
- Disable preempt -> not set
- Disconnect dialup interfaces -> not set
- Synchronize all states via -> PFSYNC
- Sync compatibility -> OPNsense 24.7 or above
- Synchronize Peer IP -> 192.168.x.z (the IP assigned to the PFSYNC on the secondary)
Configuration Synchronization Settings (XMLRPC Sync):
- Synchronize Config -> 192.168.x.z (the IP assigned to the PFSYNC on the secondary)
- Verify peer -> not set
- Remote System Username -> root
- Remote System Password -> the password of the root user (should be the same on both systems)
Services to synchronize (XMLRPC Sync):
- Services -> select the services you want to be synced (be careful and think what should not be synced)

In Systems / High Availability / Settings on the secondary I have the following settings (activate "advanced mode"):
- Disable preempt -> activated
- Disconnect dialup interfaces -> not set
- Synchronize all states via -> PFSYNC
- Sync compatibility -> OPNsense 24.7 or above
- Synchronize Peer IP -> 192.168.x.z (the IP assigned to the PFSYNC on the primary)
Configuration Synchronization Settings (XMLRPC Sync):
- Synchronize Config -> not set
- Verify peer -> not set
- Remote System Username -> not set
- Remote System Password -> not set
Services to synchronize (XMLRPC Sync):
- Services -> Nothing selected

Something to probably also check would be the "Sync compatibility -> OPNsense 24.7 or above" settings.
And I also do remember that sync only works properly when the root user is used.

As you already mention to have a pass any to any IPv4+IPv6 firewall rule on the PFSYNC interface, I have also enabled "Quick" for this rule, so no any other rules (e.g. floating) should kick in and may prevent something.

There is another thread with the same issue, where the user bamypamy did track it down to the issue that when the OPNsense systems do not have direct internet and have to use a proxy server, see https://forum.opnsense.org/index.php?msg=238761

Quote from: Fabian Wenk on June 03, 2025, 02:16:34 PMThere is another thread with the same issue, where the user bamypamy did track it down to the issue that when the OPNsense systems do not have direct internet and have to use a proxy server, see https://forum.opnsense.org/index.php?msg=238761

After reading his post, i tested the solution and it worked ! I'm also using proxy because that pair of firewall is only filtering the corp network to a lab. I did not mention it in my post because that doesn't make sense with the problem i had.

EDIT: I opened a bug:
https://github.com/opnsense/core/issues/8765