flowd_aggregate (Insight Aggregator) stopping constantly. Is this normal?

Started by nigelw, January 03, 2021, 03:29:11 AM

Previous topic - Next topic
I finally got around to setting up Monit monitoring for the flowd_aggregate service. I've wanted to do this because I often find when I go to the Dashboard that the service has stopped. I'm now getting spammed by monit because it's constantly picking up the service down and restarting it.

I've noticed a pattern, though. The 'down' detections seem to happen on the hour - almost every hour. This has led me to question whether the flowd_aggregate is even supposed to stay running? Does anyone have any information on that? Am I monitoring something as a 'failure' when it's 'by design'??

Also, I've had issues with flowd_aggregate in the past with logs created because of crashes. There are no logs this time (other than monit, and the vacuum messages after restart).

It should stay active, so probably the daemon crashed somewhere half way. If this happens very frequently, it might be good to collect some metrics about what's going on there.

The system log may already contain some info,  but if you can keep a console open until it crashes, it might be worth dumping the output of the following commands (which in your case likely crashes in an hour or so).


# service flowd_aggregate stop
# /usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console


(this first command should stop the daemon, the next one start it up again in a console without detaching).

Best regards,

Ad



You did find solution for this issue? I am having the same problem.

Apologies for the necroposting, but I observed the same issue, and decided to find the cause. The problem seems to be that root crontab calls for netflow data backup every hour on the dot:
#minute hour    mday    month   wday    command
0       */1     *       *       *       (/usr/local/etc/rc.syshook.d/backup/20-netflow) > /dev/null


20-netflow script does stop/start cycle if flowd_aggregate is running, effectively restarting the process every hour at xx:00.

Might not be a big deal in the grand scheme of things, but it seems that there is some inconsistency with other parts of the code. flowd_aggregate tries to vacuum its sqlite DBs on start, and then every 8 hours while it's running. But restarting it every hour means that vacuum runs every hour, hitting disk IO more often than anticipated (vacuum is IO heavy). That's how I noticed this behavior in the first place :)

Quote from: sevimo on January 16, 2025, 01:51:33 AMApologies for the necroposting, but I observed the same issue, and decided to find the cause. The problem seems to be that root crontab calls for netflow data backup every hour on the dot:
[...]

I'd read about problems with the periodic RRD and NetFlow backup settings, so I set Periodic RRD and NetFlow Backup to Power off under System: Settings: Miscellaneous. Can't say if this setting helps anything, but it doesn't seem to hurt. (I don't care about DHCP leases and don't use a captive portal.) I'm not sure what benefit a periodic backup would provide outside of potential file corruption from a system crash.