[solved] NetFlow disk usage

Started by Silverstar, August 29, 2016, 11:02:35 AM

Previous topic - Next topic
August 29, 2016, 11:02:35 AM Last Edit: September 20, 2016, 11:44:24 AM by Silverstar
Hi folks,

seems that NetFlow uses unlimited disk space... or am I missing any way to stop it from filling up my disk to the limit?

Best,
Silverstar

Sorry for bumping this up but the diskusage is growing and growing... is there really only the option to reset the netflow data manually but no rotation or space limitation?

Thx,
Silverstar

Hi there,

The solution is to do it manually for now, correct. It's under Reporting: Settings: "Reset Netflow Data".

I think there is rotation, but no retention policy:

% ls -lah /var/log/flowd.log*
-rw-------  1 root  wheel   5.1M Sep  1 09:38 /var/log/flowd.log
-rw-------  1 root  wheel    11M Aug 31 21:51 /var/log/flowd.log.000001

It would help to know your system's parameters, available size, your current log size and your expected capacity limitations to see how to address this issue concretely in a feature request. :)


Cheers,
Franco

Hi Franco,

today I deleted a 14G flowd.log...
The numbered files do all have 11 M but the main file grows infinitely...

root@fw:~ # du -hs /var/log/*
544K    /var/log/dhcpd.log
4.0K    /var/log/dmesg.today
4.0K    /var/log/dmesg.yesterday
544K    /var/log/filter.log
14G    /var/log/flowd.log
11M    /var/log/flowd.log.000001
11M    /var/log/flowd.log.000002
11M    /var/log/flowd.log.000003
[...]


My system is runnig on a 42G SSD.

Would be nice if we could set a rotation in the config section (GUI) instead of reset all the data from time to time.

Best,
Silverstar

Hi Silverstar,

It looks like flowd_aggregate isn't running on your end, the flowd.log file is used as a staging area for Insight, but if the aggregation process isn't running it doesn't rotate either (it rotates on a 10Mb interval normally).

With the following command you can check the status of the service:
service flowd_aggregate status


The latest version of flowd_aggregate has an automatic repair option to recover after a crash, which should prevent this in the future.
Best thing to do now, is probably to remove flow.log.* and restart flowd and the aggregator.

service flowd_aggregate stop
service flowd restart
service flowd_aggregate start



Best regards,

Ad

Just giving a +1. flowd_aggregate crashes on my instance after 3~ hours. I just checked and found myself with a 3.6GB flowd.log file. :/ It has filled my disk a few times, so frustrating!

Hi Ad,

I've done as you suggested, deleted the flowd.log* files from /var/log/, stopped flowd_aggregate, restarted flowd and started flowd_aggregate again.
But after a few seconds to a minute flowd_aggregate isn't running regarding to service flowd_aggregate status :(

Seems, something is broken here!

Best,
Silverstar

Hi Silverstar,

Can you try to run flowd_aggregate manually?

service flowd_aggregate stop
/usr/local/opnsense/scripts/netflow/flowd_aggregate.py console


Then wait for it to exit (should take the same amount of time), then post the output here including any messages in /var/log/syslog

clog /var/log/system.log


The latest version of our software should try to run an automatic repair on the sqlite files its using, so maybe your experiencing something completely different here.

Just to be sure, you are using OPNsense 16.7.3 ?

Best regards,

Ad


The database repair code is not on 16.7.3, it will be available in 16.7.4 with the patch below. You can, however, install the code running the following command in the console:

# opnsense-patch 2bcdb42

https://github.com/opnsense/core/commit/2bcdb42

Maybe this will help.


Cheers,
Franco

Hi Ad,
hi Franco,

I did as you suggested.
The script took about 15 min. of runtime.
The script itself didn't output anything on command line.

Here are the lines from clog /var/log/system.log around the runtime of the script.

Sep 19 18:47:13 fw flowd_aggregate.py: flowd aggregate died with message Traceback (most recent call last):   File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 145, in run     aggregate_flowd(do_vacuum)   File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 85, in aggregate_flowd     stream_agg_object.cleanup(do_vacuum)   File "/usr/local/opnsense/scripts/netflow/lib/aggregate.py", line 277, in cleanup     self._update_cur.execute('delete from timeserie where mtime < :expire', {'expire': expire_timestamp}) DatabaseError: database disk image is malformed
Sep 19 19:00:47 fw sshd[27366]: Accepted keyboard-interactive/pam for root from 10.0.220.6 port 58747 ssh2
Sep 19 19:01:29 fw opnsense: /index.php: Successful login for user 'root' from: 10.0.220.6
Sep 19 19:01:30 fw configd.py: [77c5f0d8-d529-41ff-984f-7557f8675e5a] IPsec list ip address pools
Sep 19 19:01:30 fw configd.py: [e6dd1c66-804c-4ae5-88d3-a3cef8923bad] IPsec list status
Sep 19 19:01:48 fw configd.py: [6a6cffb1-080c-4548-b0f3-dff2d05f9e29] IPsec list ip address pools
Sep 19 19:01:48 fw configd.py: [e2736d1d-46e4-4a1d-8241-545efce8fe43] IPsec list status
Sep 19 19:02:00 fw configd.py: [459569e5-d465-4d9c-bbdf-fa8368b01b9a] IPsec list ip address pools
Sep 19 19:02:00 fw configd.py: [4b5aa42b-8674-4223-8dc2-c2b2296d1a14] IPsec list status
Sep 19 19:02:16 fw configd.py: [85324443-fcf2-462f-a97f-d9b17745bfd6] IPsec list ip address pools
Sep 19 19:02:16 fw configd.py: [bfaaefbb-8b08-4852-987f-c0f365880cd4] IPsec list status
Sep 19 19:02:43 fw configd.py: [63b45754-fd76-4591-97b6-5f5186ac8274] show system activity
Sep 19 19:03:30 fw configd.py: [c0f49f71-c15f-472f-a9b3-298b0f728057] show system activity
Sep 19 19:03:46 fw configd.py: [7c52b133-a3c3-414f-8689-10341fbd9e0c] IPsec list ip address pools
Sep 19 19:03:46 fw configd.py: [164c2aef-31ba-4027-9ac3-dd620ed8428e] IPsec list status
Sep 19 19:04:17 fw configd.py: [8893657c-2c7e-439b-ad75-6ea64422b0ba] show system activity
Sep 19 19:04:26 fw configd.py: [0e6a1c96-63fa-4cc2-b05c-6c21074e5880] show system activity
Sep 19 19:04:38 fw configd.py: [b8cff5d9-4feb-4a83-8740-f8ba7a59e2a2] show system activity
Sep 19 19:04:40 fw configd.py: [e13586a8-0759-4056-8ae7-c3ffb3aaa968] show system activity
Sep 19 19:04:54 fw configd.py: [a5496d29-1f1f-4cca-a45f-51c66577f11a] IPsec list ip address pools
Sep 19 19:04:54 fw configd.py: [49150607-f6be-4e57-aa27-b8c2012fb27e] IPsec list status
Sep 19 19:04:59 fw kernel: A,4282023658,2141745024,65535,,
Sep 19 19:05:03 fw configd.py: [07711743-c7e3-4e83-a6a9-5c20ba109e17] show system activity
Sep 19 19:05:11 fw configd.py: [c9ddda98-20de-42c7-bcf9-dcb281f0d66b] show system activity
Sep 19 19:05:20 fw configd.py: [c02e0b17-bf6d-4df7-b95d-bd3b95641c70] show system activity
Sep 19 19:06:32 fw configd.py: [0bf487aa-95d3-41f5-a9f1-242644e758be] show system activity
Sep 19 19:08:06 fw configd.py: [ad232804-d3f1-4f09-8765-38e109e23ea1] show system activity
Sep 19 19:08:32 fw configd.py: [5d5657d4-9698-424b-b2b7-43174b248eb3] show system activity
Sep 19 19:09:55 fw configd.py: [638a3cd1-291c-4d47-89bf-e2a0d001065c] show system activity
Sep 19 19:10:38 fw configd.py: [d8f4c6aa-0429-4c6b-ba20-e48800ab4dbb] show system activity
Sep 19 19:10:49 fw configd.py: [b56d4d76-5b78-4eb8-b4a7-b6663e488d3a] show system activity
Sep 19 19:10:55 fw configd.py: [586829ad-69a3-469f-89ac-9b0dbe7de498] show system activity
Sep 19 19:11:14 fw configd.py: [1e0a5b66-b48c-4eab-88f4-b4c118d5d7e8] show system activity
Sep 19 19:11:20 fw configd.py: [4c3b2b8f-63d8-4a8f-b40d-c53e3d5f3ca4] show system activity
Sep 19 19:11:53 fw configd.py: [e826f334-cec2-42e0-8b3b-b2359ccaa5d7] show system activity
Sep 19 19:12:14 fw configd.py: [d835988a-5a82-492d-a79c-996d5ade3f59] show system activity
Sep 19 19:12:20 fw configd.py: [4c8b40f2-c5c9-4f0c-b38a-174b616bbfd2] show system activity
Sep 19 19:12:24 fw configd.py: [bfa1c6e4-cc41-43dd-a3cf-077a880fe732] show system activity
Sep 19 19:13:02 fw configd.py: [e97ad08c-d85a-4cdf-98d6-e7eb2096842c] show system activity
Sep 19 19:13:31 fw configd.py: [6ce5d6c5-5b04-4086-8aa9-f36e25e41048] show system activity
Sep 19 19:14:12 fw configd.py: [c81fa60a-579f-4d62-bec3-d1cbf48147b3] show system activity
Sep 19 19:15:04 fw configd.py: [4c34ddf3-53d9-42eb-96d6-14a80f5b137a] show system activity
Sep 19 19:15:13 fw configd.py: [b1b1ee07-476a-4803-aaed-94dc5c761b76] show system activity
Sep 19 19:15:16 fw configd.py: [1a4c6ea8-2291-44b3-9b9c-7ed821ba7724] show system activity
Sep 19 19:15:20 fw configd.py: [dfa0ff61-8f80-429f-a869-07f5dbd650d1] show system activity
Sep 19 19:15:23 fw configd.py: [d47433f9-9030-43e4-be91-1a162db6212e] show system activity
Sep 19 19:15:27 fw configd.py: [a20e7c6f-ce5b-4965-8a25-27210c616320] show system activity
Sep 19 19:15:42 fw configd.py: [e85dee2d-98af-4fc3-9ed8-c6c81b74df33] show system activity
Sep 19 19:16:14 fw configd.py: [30be9ba0-9d89-442c-9491-70387982dbed] show system activity
Sep 19 19:16:23 fw configd.py: [5a666136-8741-412f-bb6d-176e231d09d0] show system activity
Sep 19 19:16:47 fw kernel: A,1209682972,961485202,65535,,
Sep 19 19:16:48 fw configd.py: [6a943954-07c6-407c-b346-bb7aaaef0afd] show system activity
Sep 19 19:16:55 fw configd.py: [73688291-5017-47e5-b3b1-7730d7c9af08] show system activity
Sep 19 19:16:58 fw configd.py: [8071150b-09ad-4b65-8266-bbca2d424b73] show system activity
Sep 19 19:17:23 fw configd.py: [20a979cc-3b3b-4570-8ad7-c524c2072bbd] show system activity
Sep 19 19:18:16 fw flowd_aggregate.py: flowd aggregate died with message Traceback (most recent call last):   File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 145, in run     aggregate_flowd(do_vacuum)   File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 85, in aggregate_flowd     stream_agg_object.cleanup(do_vacuum)   File "/usr/local/opnsense/scripts/netflow/lib/aggregate.py", line 277, in cleanup     self._update_cur.execute('delete from timeserie where mtime < :expire', {'expire': expire_timestamp}) DatabaseError: database disk image is malformed
Sep 19 19:18:27 fw configd.py: [0a32a78b-720c-4528-89a8-5cb7f9c9ef91] show system activity
Sep 19 19:18:42 fw configd.py: [d34ade08-63d2-4329-b463-591ddb55624a] IPsec list ip address pools
Sep 19 19:18:42 fw configd.py: [c692d1bb-95e2-43f0-88ed-3c97ed164b6b] IPsec list status


The system version is 16.7.2

Will take the update to 16.7.3 and the patch from Franco as the next steps and keep you guys posted...

Thanks,
Silverstar

Update & patch done.
service flowd_aggregate keeps running for now.
Produces log files at 11MB each.

Keep you posted if roation kicks in...

Best,
Silverstar

Seems to be solved :)
Thank you guys!

Rotation works and keeps 10 files.
Working file never exceeds 10 MB.
root@fw:/var/log # ls -alh flowd.log*
-rw-------  1 root  wheel   5.1M Sep 20 11:39 flowd.log
-rw-------  1 root  wheel    11M Sep 20 11:36 flowd.log.000001
-rw-------  1 root  wheel    11M Sep 20 11:30 flowd.log.000002
-rw-------  1 root  wheel    11M Sep 20 11:24 flowd.log.000003
-rw-------  1 root  wheel    12M Sep 20 11:18 flowd.log.000004
-rw-------  1 root  wheel    12M Sep 20 11:12 flowd.log.000005
-rw-------  1 root  wheel    11M Sep 20 11:06 flowd.log.000006
-rw-------  1 root  wheel    11M Sep 20 11:01 flowd.log.000007
-rw-------  1 root  wheel    11M Sep 20 10:55 flowd.log.000008
-rw-------  1 root  wheel    11M Sep 20 10:49 flowd.log.000009
-rw-------  1 root  wheel    11M Sep 20 10:43 flowd.log.000010


Best,
Silverstar

It looks like the crashes are back. I implemented the patch referenced here but I have another 16GB+ flowd.log file. :( No logs in the system log either.

This issue is still happening.

OPNsense 16.7.7-amd64
FreeBSD 10.3-RELEASE-p11
OpenSSL 1.0.2j 26 Sep 2016

root@vpn:/var/log # ps wwwaux | grep -i flow
root   73815  37.5  0.5 122160 32864  -  Ds   Mon05PM  1796:05.42 /usr/local/bin/python2.7 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py
_flowd 85758   0.2  0.0  12360  2412  -  Ds   Mon05PM    13:34.98 flowd: net (flowd)
root   85525   0.0  0.0  12360  1560  -  Is   Mon05PM     0:00.00 flowd: monitor (flowd)
root   82449   0.0  0.0  18752  2196  0  S+    5:06PM     0:00.00 grep -i flow
root@vpn:/var/log # ls -alh flow*
-rw-------  1 root  wheel    33G Nov  3 17:06 flowd.log
root@vpn:/var/log #

I think I am into the same problem here, updated last week to the actual version and now it seems that opnsense crashes from time to time.
Having a 1.3G flowd.log, till Sep. 29 I have 11 MB logs. flowd_aggregate seems to eat up one core completely of the server constantly.
Disables netflow now and everythings seems back to normal.