Hey there,
we are running multiple OPNSense Instances which are configured as Master-Backup Setups. On one of those Clusters we noticed that the suricata logs are growing constantly to mulitple GB sizes.
The logs of the last 3 Days are looking like this:
-rwx------ 1 root wheel 10G Nov 13 23:56 suricata_20241113.log
-rwx------ 1 root wheel 17G Nov 14 23:59 suricata_20241114.log
-rw------- 1 root wheel 7.8G Nov 15 11:41 suricata_20241115.log
We noticed that the suricata service restarts every 5 Minutes and always write around 20 MB Lines which shows that Rules are duplicated.
The duplicated rule errors are also present on the other OPNSense Slaves in the other Clusters so as far as we understand it is caused by the High Availability setup.
After we noticed that suricata gets restarted every 5 minutes we noticed that other services are restarted as well:
(Taken from the backend log via the GUI of the slave)
2024-11-15T11:45:17 1 Notice configd.py 68284 [92783ee2-9f69-4775-9ebb-ed5ffbefb9fa] Restarting web GUI
2024-11-15T11:45:12 1 Notice configd.py 68284 [ee6b9555-8185-4900-b803-57c6a5d5d85a] restart suricata daemon
2024-11-15T11:45:12 1 Notice configd.py 68284 [082fe44e-64dd-4983-a118-cd7411d19c1f] restart netflow
2024-11-15T11:45:10 1 Notice configd.py 68284 [ddc13cd2-9068-404f-a470-8b856630caf2] Restarting OpenSSH
2024-11-15T11:45:09 1 Notice configd.py 68284 [fa0b5130-53e2-4ed9-b8c4-8a15e2b86543] restart kea daemon
2024-11-15T11:45:07 1 Notice configd.py 68284 [2b673549-68f4-49ae-9c46-2168b8617b03] restarting haproxy
2024-11-15T11:45:06 1 Notice configd.py 68284 [c9f5e649-0e8e-44d8-b8fa-ea0ef4e6875b] restart netflow data aggregator
When we disable the cronjob (which runs every 5 min) to sync the configuration from the master to the slave we do not see any more restarts.
The cron is configured via the GUI looks like this:
(https://forum.opnsense.org/index.php?action=dlattach;topic=44026.0;attach=39645)
About the High Availability Configuration is this:
(https://forum.opnsense.org/index.php?action=dlattach;topic=44026.0;attach=39648;image)
Does one of you have any suggestion we can check? Or maybe something that might go wrong?
We do not find any critical errors or something alike.
The cron job indeed restarts all services each time it is invoked.
https://github.com/opnsense/core/blob/5f533d45731db1611d6579a7638012968fb01865/src/opnsense/service/conf/actions.d/actions_system.conf#L106
Maybe 5 minutes is not the best choice here. I would synchronize daily, since it also prevents human errors getting synchronized instantly.
ok but then i am confused why the other 3 Clusters we have with the same settings are not restarting the services all the time..
Here are the loglines from another cluster and you can see that the servcies are not restarted aways on a sync
2024-11-13T09:57:50 Notice configd.py [0ef4219a-2a0a-46f8-87dd-78b7c1e2185e] restart suricata daemon
2024-11-07T04:34:08 Notice configd.py [2bc0e805-6485-4a4e-ba4a-5840081b56e0] restarting cron
2024-11-07T04:34:08 Notice configd.py [1d2c3400-33df-42bc-9f06-cf26e1e5880b] Restarting syslog
2024-10-23T11:20:03 Notice configd.py [ec8dc486-aed4-4d67-a1ed-9aca4c32aa27] restart kea daemon
2024-10-23T11:19:58 Notice configd.py [7ce4d26d-e901-4b7e-9750-dcd294ec55ac] restart kea daemon
I'm not sure here, but this does not change the fact that the configuration is non-optimal:
https://docs.opnsense.org/manual/hacarp.html#automatic-replication