High availability sync appears to have stopped working but CARP still fine

Archive > 22.7 Legacy Series

<< < (2/2)

b1t_r0t:
I am seeing this same issue after upgrading from 22.1 to 22.7.6. It actually looks like everything is working still and the fail over works, its just something with the sync.

This seems to be related to this issue here:
https://forum.opnsense.org/index.php?topic=29521.0

I was able to repeat the issue rolling back to snapshots I had, happens every time I upgrade to 22.7.6.

b1t_r0t:
I rolled back again (22.1.10) and then upgraded again, and everything was still broken, but the error changed from the parsing error mentioned in the other post to host down now.

I disabled and renabled the interfaces on both the opnsense and vmware sides, and everything is working on 22.7.6.

--- Quote from: sesquipedality on October 21, 2022, 04:28:22 pm ---Thanks for the suggestion. Yes, there is. This is a previously working config that appears to have stopped working at some point. I did have to reinstall the primary server at one point and did so using the USB stick config transfer method. No passwords have changed. The problem is that the diagnostic message I'm getting is so non-specific as to leave me lost as to how to even investigate what's not working.

--- End quote ---

Log into the console and run this:
# /usr/local/etc/rc.filter_synchronize

Whats the output?

sesquipedality:
Sorry for the delayed reply - got busy with other stuff and this got put on the back burner.

The output is:

--- Code: ---root@<host>:~ # /usr/local/etc/rc.filter_synchronize
send >>>
Host: 192.168.66.4
User-Agent: XML_RPC
Content-Type: text/xml
Content-Length: 117
Authorization: Basic cm9vdDpQaWJqSXBzSUxwVEFmNHlZOTZ4Uw==
<?xml version="1.0"?>
<methodCall>
<methodName>opnsense.firmware_version</methodName>
<params>
</params></methodCall>received >>>
error >>>
fetch error. remote host down?root@fenchurch:~ # send >>>
Missing name for redirect.
<methodName>opnsense.firmware_version</methodName>
<params>
</params></methodCall>received >>>
error >>>
fetch error. remote host down?
--- End code ---

This did enable me to discover that I wasn't able to traceroute/ping the backup interface from the main interface. I went through all my firewall rules to try to work out what was wrong, and the only difference I could find was that for entirely inexplicable reasons, some automatic outbound NAT rules were being generated for the backbone on the primary router (perhaps because the primary router is configured to route by the backbone if the primary internet goes down. Anyway these happened after outbound NAT was manually disabled for the interface, and I checked that when disabling outbound rules the problem still existed.)

In any event having been through all that and disabled and re-enabled gateways I am now at a point where ping, ssh and http over the backbone are working again, and so sync is back up and running. Subsequent runs are not producing an error, and my sync menu is now back. I do not know and probably never will know which traceroute over the backbone works on the secondary, but not the primary router. Thanks for your help with this. I do wish routers were a little less "black box" sometimes.

b1t_r0t:
Glad it worked out. *High Five* :)

Navigation

[0] Message Index

[*] Previous page

Go to full version