[SOLVED] Config sync OK but not applied on HA slave node

Started by farsoft, September 07, 2016, 03:10:50 PM

Previous topic - Next topic
Hi everybody,

I've got a new little problem with my OPNsense A10 HA cluster (OPNsense 16.7.3-amd64).

I've setup XMLRPC Sync between the two nodes. If do a config change in master node (ex.: adding a new route or a new firewall rule), it's visible on the slave node but it's not applied automatically.

For instance, if I add a new route and apply the change on master node, I can see the new route immediately on the GUI of the slave node. However, if I look at routes status, it isn't there. If want to see it, I have to force an update and apply the changes on slave node.

I've found and old topic about a similar problem in an old OPNsense version but it was supposed to be fixed.

https://forum.opnsense.org/index.php?topic=1309.msg3738

Have you got any idea, please ?

Thanks in advance.

Regards,

farsoft

Hi farsoft,

Can you check the syslog output on the backup when you apply firewall rules?

clog -f /var/log/system.log


It should trigger an apply on the backup when the primary hits apply, but the mechanism to synchronise isn't (and never was) flawless.
We've added the status page (status_habackup.php) to be able to reconfigure the backup easily, it always was very easy to get both machines out of sync (running services etc).

Best regards,

Ad

Hi Ad,

Thank you for your answer.

No, there is nothing on the system log of the backup when I apply on the master.

However, I've tcpdumped on backup and I can see the XMLRCP queries when I apply on the master but changes are still not applied on the backup.

I've switched the GUIs from https to http to be able to see the content of the queries and then I've done a "Force config sync" on the master.

Here is an exctract from the POST query received on the backup:

POST /xmlrpc.php HTTP/1.0
Connection: close
Host: 172.31.167.86
User-Agent: XML_RPC
Content-Type: text/xml
Content-Length: 64748
Authorization: Basic cm9vdDpFeDQ5QGhvbWU=

<?xml version="1.0"?>
<methodCall>
<methodName>opnsense.restore_config_section</methodName>
<params>
<param><value><struct>
  <member><name>filter</name><value><struct>
  <member><name>rule</name><value><array><data>
  <value><struct>
  <member><name>type</name><value><string>block</string></value></member>
  <member><name>interface</name><value><string>wan</string></value></member>
  <member><name>ipprotocol</name><value><string>inet46</string></value></member>
  <member><name>statetype</name><value><string>keep state</string></value></member>
  <member><name>descr</name><value><string>Block tout</string></value></member>
  <member><name>log</name><value><string>1</string></value></member>
  <member><name>source</name><value><struct>
  <member><name>any</name><value><string>1</string></value></member>
</struct></value></member>
  <member><name>destination</name><value><struct>
  <member><name>any</name><value><string>1</string></value></member>
</struct></value></member>
  <member><name>updated</name><value><struct>
  <member><name>username</name><value><string>root@172.31.163.9</string></value></member>
  <member><name>time</name><value><string>1472546467.8115</string></value></member>
  <member><name>description</name><value><string>/firewall_rules_edit.php made changes</string></value></member>
</struct></value></member>
...


And the answer from the backup:

HTTP/1.0 200 OK
Expires: Fri, 09 Sep 2016 22:35:29 GMT
Cache-Control: max-age=180000
Connection: close
Content-Length: 161
Content-type: text/xml;charset=UTF-8
Date: Wed, 07 Sep 2016 22:35:30 +0200
Server: lighttpd/1.4.41

<?xml version="1.0"?>
<methodResponse>
  <params>
    <param>
      <value>
      <boolean>1</boolean>
      </value>
    </param>
  </params>
</methodResponse>



Another information that can help maybe: restarting the services on the backup from status_habackup.php page on the master works fine.

Hi farsoft,

Can you run this on the master and inspect the syslog output from the backup and the master?

configctl filter sync restart


Normally it should run this on applying the firewall rules of the master.

Regards,

Ad

Hi Ad,

Your command seems to work.

On the master:

root@LYFWINT1:~ # configctl filter sync restart
OK


On the backup:

Sep  8 11:07:32 LYFWINT2 configd.py: [622cb9cd-6ae7-45ac-9eb9-82dedefb64d2] Reloading filter
Sep  8 11:07:32 LYFWINT2 opnsense: /usr/local/etc/rc.filter_configure_sync: Could not find IPv6 gateway for interface(wan).
Sep  8 11:07:33 LYFWINT2 opnsense: /xmlrpc.php: ROUTING: setting IPv4 default route to 10.0.0.138
Sep  8 11:07:33 LYFWINT2 opnsense: /xmlrpc.php: Removing static route for monitor 8.8.8.8



Regards,

farsoft

Hi farsoft,

Ok, that's strange, are you sure your sync doesn't happen on apply?

Tracing the code, the following should happen:

https://github.com/opnsense/core/blob/master/src/www/firewall_rules.php#L52


Should show the message "settings have been applied..." on screen.

Then  https://github.com/opnsense/core/blob/master/src/etc/inc/filter.inc#L321 is executed, which checks for the existence of synchronizetoip and executes the provided configd command (filter sync restart).

At a first glance, I don't see a reason why this won't work.

Best regards,

Ad

Hi Ad,

Yes, I'm sure sync doesn't happen on apply.

I think the problem is related to the check for the existence of synchronizetoip.

It seems that the check is KO an then, configd_run('filter sync restart') is never called.

I've commented the test as below and tried again.

if (!file_exists("/var/run/booting")) {
        configd_run('filter reload');
        // if ( isset($config['hasync']['synchronizetoip']) && trim($config['hasync']['synchronizetoip']) != "") {
            configd_run('filter sync restart');
        // }
    }


It has worked.

Log on backup:

Sep  8 13:07:23 LYFWINT2 configd.py: [2245481d-db51-4985-8f22-0bb1bb0842c2] Reloading filter
Sep  8 13:07:24 LYFWINT2 opnsense: /usr/local/etc/rc.filter_configure_sync: Could not find IPv6 gateway for interface(wan).
Sep  8 13:07:24 LYFWINT2 opnsense: /xmlrpc.php: ROUTING: setting IPv4 default route to 10.0.0.138
Sep  8 13:07:24 LYFWINT2 opnsense: /xmlrpc.php: Removing static route for monitor 8.8.8.8



However, synchronizetoip is set up on the master (see attached screenshot).

Best regards,

farsoft


ok, that's weird, but at least we have something to investigate now.
Let me inspect this part of the code again, I must be overlooking something here.

Hi farsoft,

I can't reproduce this on my end, we can simplify the check for !empty a bit, but it makes no sense why this doesn't work on your end.

Pushed this https://github.com/opnsense/core/commit/0e158f0b0cf842f7e194982ba08cd4965a8c4062 although functionally its doing the same (only less code).

Can you inspect your config.xml for the hasync section? (maybe post it online without the password).

Regards,

Ad

Hi Ad,

As expected, same behaviour with your new code.

Here is the hasync section of my /conf/config.xml file on master node:


<hasync>
    <pfsyncpeerip>172.31.167.86</pfsyncpeerip>
    <pfsyncinterface>lan</pfsyncinterface>
    <synchronizetoip>172.31.167.86</synchronizetoip>
    <username>root</username>
    <password>*******</password>
    <pfsyncenabled>on</pfsyncenabled>
    <synchronizestaticroutes>on</synchronizestaticroutes>
    <synchronizeusers>on</synchronizeusers>
    <synchronizeauthservers>on</synchronizeauthservers>
    <synchronizecerts>on</synchronizecerts>
    <synchronizerules>on</synchronizerules>
    <synchronizeschedules>on</synchronizeschedules>
    <synchronizealiases>on</synchronizealiases>
    <synchronizenat>on</synchronizenat>
    <synchronizeipsec>on</synchronizeipsec>
    <synchronizeopenvpn>on</synchronizeopenvpn>
    <synchronizedhcpd>on</synchronizedhcpd>
    <synchronizewol>on</synchronizewol>
    <synchronizelb>on</synchronizelb>
    <synchronizevirtualip>on</synchronizevirtualip>
    <synchronizednsforwarder>on</synchronizednsforwarder>
    <synchronizednsresolver>on</synchronizednsresolver>
    <synchronizeshaper>on</synchronizeshaper>
    <synchronizecaptiveportal>on</synchronizecaptiveportal>
  </hasync>

Hi farsoft,

Stupid me..... I looked at the diff part, totally missing the beginning of the function.

This should fix it (missing global).
https://github.com/opnsense/core/commit/65653b7c97d619f2d45fcc92a9bedb8f3a0651bd

Kind regards,

Ad

It works!!!  :D :D :D

Thank you very much Ad!

Regards,

farsoft