OPNsense Forum

English Forums => High availability => Topic started by: Rebecca Dieter on November 15, 2024, 12:55:15 PM

Title: High Availability Sync causes restart of all services on each synchronisation
Post by: Rebecca Dieter on November 15, 2024, 12:55:15 PM
Hey there,

we are running multiple OPNSense Instances which are configured as Master-Backup Setups. On one of those Clusters we noticed that the suricata logs are growing constantly to mulitple GB sizes.

The logs of the last 3 Days are looking like this:

-rwx------  1 root wheel   10G Nov 13 23:56 suricata_20241113.log
-rwx------  1 root wheel   17G Nov 14 23:59 suricata_20241114.log
-rw-------  1 root wheel  7.8G Nov 15 11:41 suricata_20241115.log



We noticed that the suricata service restarts every 5 Minutes and always write around 20 MB Lines which shows that Rules are duplicated.
The duplicated rule errors are also present on the other OPNSense Slaves in the other Clusters so as far as we understand it is caused by the High Availability setup.

After we noticed that suricata gets restarted every 5 minutes we noticed that other services are restarted as well:
(Taken from the backend log via the GUI of the slave)

2024-11-15T11:45:17 1 Notice configd.py 68284 [92783ee2-9f69-4775-9ebb-ed5ffbefb9fa] Restarting web GUI
2024-11-15T11:45:12 1 Notice configd.py 68284 [ee6b9555-8185-4900-b803-57c6a5d5d85a] restart suricata daemon
2024-11-15T11:45:12 1 Notice configd.py 68284 [082fe44e-64dd-4983-a118-cd7411d19c1f] restart netflow
2024-11-15T11:45:10 1 Notice configd.py 68284 [ddc13cd2-9068-404f-a470-8b856630caf2] Restarting OpenSSH
2024-11-15T11:45:09 1 Notice configd.py 68284 [fa0b5130-53e2-4ed9-b8c4-8a15e2b86543] restart kea daemon
2024-11-15T11:45:07 1 Notice configd.py 68284 [2b673549-68f4-49ae-9c46-2168b8617b03] restarting haproxy
2024-11-15T11:45:06 1 Notice configd.py 68284 [c9f5e649-0e8e-44d8-b8fa-ea0ef4e6875b] restart netflow data aggregator


When we disable the cronjob (which runs every 5 min) to sync the configuration from the master to the slave we do not see any more restarts.
The cron is configured via the GUI looks like this:
(https://forum.opnsense.org/index.php?action=dlattach;topic=44026.0;attach=39645)

About the High Availability Configuration is this:
(https://forum.opnsense.org/index.php?action=dlattach;topic=44026.0;attach=39648;image)
Does one of you have any suggestion we can check? Or maybe something that might go wrong?
We do not find any critical errors or something alike.
Title: Re: High Availability Sync causes restart of all services on each synchronisation
Post by: Monviech (Cedrik) on November 15, 2024, 02:36:23 PM
The cron job indeed restarts all services each time it is invoked.

https://github.com/opnsense/core/blob/5f533d45731db1611d6579a7638012968fb01865/src/opnsense/service/conf/actions.d/actions_system.conf#L106

Maybe 5 minutes is not the best choice here. I would synchronize daily, since it also prevents human errors getting synchronized instantly.
Title: Re: High Availability Sync causes restart of all services on each synchronisation
Post by: Rebecca Dieter on November 15, 2024, 02:40:58 PM
ok but then i am confused why the other 3 Clusters we have with the same settings are not restarting the services all the time..

Here are the loglines from another cluster and you can see that the servcies are not restarted aways on a sync

2024-11-13T09:57:50 Notice configd.py [0ef4219a-2a0a-46f8-87dd-78b7c1e2185e] restart suricata daemon
2024-11-07T04:34:08 Notice configd.py [2bc0e805-6485-4a4e-ba4a-5840081b56e0] restarting cron
2024-11-07T04:34:08 Notice configd.py [1d2c3400-33df-42bc-9f06-cf26e1e5880b] Restarting syslog
2024-10-23T11:20:03 Notice configd.py [ec8dc486-aed4-4d67-a1ed-9aca4c32aa27] restart kea daemon
2024-10-23T11:19:58 Notice configd.py [7ce4d26d-e901-4b7e-9750-dcd294ec55ac] restart kea daemon
Title: Re: High Availability Sync causes restart of all services on each synchronisation
Post by: Monviech (Cedrik) on November 15, 2024, 03:00:04 PM
I'm not sure here, but this does not change the fact that the configuration is non-optimal:

https://docs.opnsense.org/manual/hacarp.html#automatic-replication