OPNsense Forum

Archive => 19.1 Legacy Series => Topic started by: Andreas_ on May 31, 2019, 10:14:12 am

Title: flowd.log not rotated
Post by: Andreas_ on May 31, 2019, 10:14:12 am
I've been running netflow logging a while now flawlessly. Since about two weeks (I guess post-19.1.5), rotation of flowd.log stalls, resulting in a fillled disk and consequently stopping some services for out-of-diskspace reasons. This includes flowd and flowd_aggregate.py. The firewall is monitored, so I could check that flowd_aggregate is killed from out-of-disk, not before.

I upgraded to 19.1.8, and see some strange behaviour of flowd_aggregate:
Usually, flowd.log will reach between 11MB and 13MB before rotating; it takes about 5 minutes for this size. But I also can observe this:

-rw-------  1 root  wheel   3.2M May 31 10:01 flowd.log
-rw-------  1 root  wheel    12M May 31 10:00 flowd.log.000001
-rw-------  1 root  wheel    13M May 31 09:56 flowd.log.000002
-rw-------  1 root  wheel    56M May 31 09:51 flowd.log.000003
-rw-------  1 root  wheel    13M May 31 09:29 flowd.log.000004
-rw-------  1 root  wheel    23M May 31 09:24 flowd.log.000005
-rw-------  1 root  wheel    12M May 31 09:15 flowd.log.000006
-rw-------  1 root  wheel    13M May 31 09:11 flowd.log.000007
-rw-------  1 root  wheel    11M May 31 09:06 flowd.log.000008
-rw-------  1 root  wheel    12M May 31 09:01 flowd.log.000009
-rw-------  1 root  wheel    12M May 31 08:56 flowd.log.000010

So obviously flowd_aggregate stalls sometimes for some minutes, and then continues to work. I can't see any anomalies in cpu load or usage during this period. Nothing in the system log, the last flowd related message is
 May 31 06:47:26 fw05a flowd_aggregate.py: vacuum done

While this is not the total stop of flowd.log rotation I've been suffering from in the last weeks, it still seems suspicious to me. What's going wrong here?

Added: stalling is happening right now, and I can see the flowd_aggregate proces consuming a lot of CPU.
Title: Re: flowd.log not rotated: aggregation too slow
Post by: Andreas_ on May 31, 2019, 01:43:13 pm
I started with fresh logging, i.e. deleted /var/log/flowd* and /var/netflow/*, restarted flowd and running  flowd_aggregate.py with some timing added in the console.

I can see that a typical aggregation run will take 70-80 seconds, then rotation is checked, then 15 seconds nap.
This means, that currently the flowd_aggregate process is at 85% of its capacity; there are certainly phases where firewall traffic is heavier than currently.

flowd.log is filled with a rate of about 2.5-5MB/min, /var/netflow/* has about 5MB. Doesn't seem to heavy to me.
Machine as a 8-core C3758CPU, Load avg about 1.3, with CPU at some % (flowd_aggregate will add 12%), SATA-DOM SSD.