Print Page - High Availability Sync causes restart of all services on each synchronisation

Title: High Availability Sync causes restart of all services on each synchronisation
Post by: Rebecca Dieter on November 15, 2024, 12:55:15 PM

Hey there,

we are running multiple OPNSense Instances which are configured as Master-Backup Setups. On one of those Clusters we noticed that the suricata logs are growing constantly to mulitple GB sizes.

The logs of the last 3 Days are looking like this:

Code Select


-rwx------  1 root wheel   10G Nov 13 23:56 suricata_20241113.log
-rwx------  1 root wheel   17G Nov 14 23:59 suricata_20241114.log
-rw-------  1 root wheel  7.8G Nov 15 11:41 suricata_20241115.log

We noticed that the suricata service restarts every 5 Minutes and always write around 20 MB Lines which shows that Rules are duplicated.
The duplicated rule errors are also present on the other OPNSense Slaves in the other Clusters so as far as we understand it is caused by the High Availability setup.

After we noticed that suricata gets restarted every 5 minutes we noticed that other services are restarted as well:
(Taken from the backend log via the GUI of the slave)

Code Select


2024-11-15T11:45:17	1	Notice	configd.py	68284	[92783ee2-9f69-4775-9ebb-ed5ffbefb9fa] Restarting web GUI	
2024-11-15T11:45:12	1	Notice	configd.py	68284	[ee6b9555-8185-4900-b803-57c6a5d5d85a] restart suricata daemon	
2024-11-15T11:45:12	1	Notice	configd.py	68284	[082fe44e-64dd-4983-a118-cd7411d19c1f] restart netflow	
2024-11-15T11:45:10	1	Notice	configd.py	68284	[ddc13cd2-9068-404f-a470-8b856630caf2] Restarting OpenSSH	
2024-11-15T11:45:09	1	Notice	configd.py	68284	[fa0b5130-53e2-4ed9-b8c4-8a15e2b86543] restart kea daemon	
2024-11-15T11:45:07	1	Notice	configd.py	68284	[2b673549-68f4-49ae-9c46-2168b8617b03] restarting haproxy	
2024-11-15T11:45:06	1	Notice	configd.py	68284	[c9f5e649-0e8e-44d8-b8fa-ea0ef4e6875b] restart netflow data aggregator

When we disable the cronjob (which runs every 5 min) to sync the configuration from the master to the slave we do not see any more restarts.
The cron is configured via the GUI looks like this:
(https://forum.opnsense.org/index.php?action=dlattach;topic=44026.0;attach=39645)

About the High Availability Configuration is this:
(https://forum.opnsense.org/index.php?action=dlattach;topic=44026.0;attach=39648;image)
Does one of you have any suggestion we can check? Or maybe something that might go wrong?
We do not find any critical errors or something alike.

Title: Re: High Availability Sync causes restart of all services on each synchronisation
Post by: Monviech (Cedrik) on November 15, 2024, 02:36:23 PM

The cron job indeed restarts all services each time it is invoked.

https://github.com/opnsense/core/blob/5f533d45731db1611d6579a7638012968fb01865/src/opnsense/service/conf/actions.d/actions_system.conf#L106

Maybe 5 minutes is not the best choice here. I would synchronize daily, since it also prevents human errors getting synchronized instantly.

Title: Re: High Availability Sync causes restart of all services on each synchronisation
Post by: Rebecca Dieter on November 15, 2024, 02:40:58 PM

ok but then i am confused why the other 3 Clusters we have with the same settings are not restarting the services all the time..

Here are the loglines from another cluster and you can see that the servcies are not restarted aways on a sync

Code Select


2024-11-13T09:57:50	Notice	configd.py	[0ef4219a-2a0a-46f8-87dd-78b7c1e2185e] restart suricata daemon	
2024-11-07T04:34:08	Notice	configd.py	[2bc0e805-6485-4a4e-ba4a-5840081b56e0] restarting cron	
2024-11-07T04:34:08	Notice	configd.py	[1d2c3400-33df-42bc-9f06-cf26e1e5880b] Restarting syslog	
2024-10-23T11:20:03	Notice	configd.py	[ec8dc486-aed4-4d67-a1ed-9aca4c32aa27] restart kea daemon	
2024-10-23T11:19:58	Notice	configd.py	[7ce4d26d-e901-4b7e-9750-dcd294ec55ac] restart kea daemon

Title: Re: High Availability Sync causes restart of all services on each synchronisation
Post by: Monviech (Cedrik) on November 15, 2024, 03:00:04 PM

I'm not sure here, but this does not change the fact that the configuration is non-optimal:

https://docs.opnsense.org/manual/hacarp.html#automatic-replication

OPNsense Forum

English Forums => High availability => Topic started by: Rebecca Dieter on November 15, 2024, 12:55:15 PM