[SOLVED] Netflow Insight No Data Available

Started by adcwb, August 25, 2016, 06:09:54 PM

Previous topic - Next topic
From the GUI this would only be possible with Reporting: Settings: "Reset Netflow Data".

This would, however, make a log entry in the system log. Search for "flush netflow".

The flowd data is decoupled from the database: flowd records the netflow records, the database never deletes them, only processes them.

And if your uptime is greater than the earliest log file we are not looking at a file system corruption either.

Your install is ok too, you're not using a memory file system for /var which would delete all flowd records on boot.

I can't recall that I pushed the "Reset Netflow Data" button.

There is no "flush netflow" in system.log. There is not even any "flush" in system.log...

There are however netflow entries as long back as 25/8 in system.log. (I don't know if it's relevant)


Aug 25 22:36:40 OPNsense configd.py: [13707223-0fe2-4f3d-8efb-2ddb7f7b13a1] request netflow data aggregator timeseries for FlowInterfaceTotals
Aug 25 22:36:41 OPNsense configd.py: [f20f15a8-f87e-4cce-9b24-d0fc76ea3bac] request netflow data aggregator top usage for FlowDstPortTotals
Aug 25 22:36:41 OPNsense configd.py: [6fc0132c-0251-4824-95c6-fe4a7a8590ea] request netflow data aggregator top usage for FlowSourceAddrTotals
Aug 25 22:36:41 OPNsense configd.py: [2f73198c-0c5e-442d-bdc9-4f37b9ceb6c1] request netflow data aggregator top usage for FlowInterfaceTotals
Aug 25 22:36:42 OPNsense configd.py: [dc3bab3e-56d6-46bd-b5aa-5c5f3460be54] request netflow data aggregator top usage for FlowInterfaceTotals
Aug 25 22:36:42 OPNsense configd.py: [14cbc454-8242-460d-aa01-7685e70afd36] request netflow data aggregator metadata
Aug 25 22:37:02 OPNsense configd.py: [1df39fb7-ebbb-414e-9e55-e55101168b30] request netflow data aggregator top usage for FlowSourceAddrDetails
Aug 26 12:25:48 OPNsense configd.py: [da581fa7-52a4-4a88-84d1-819b653fdd85] retrieve flow cache statistics


Looks like we are not able to find the cause of this unless you have any more clevver way of checking things.

It's not like the end of the day if I don't have the data, it just bugs me that I don't know why they disappeared.

   /Jonas...




Quote from: Zapp on September 02, 2016, 01:09:14 PM

root@OPNsense:/var/log # ls -lah /var/log/flowd*
-rw-------  1 root  wheel   2.1M Sep  2 12:50 /var/log/flowd.log
-rw-------  1 root  wheel    11M Sep  2 12:21 /var/log/flowd.log.000001
-rw-------  1 root  wheel    11M Sep  2 10:25 /var/log/flowd.log.000002
-rw-------  1 root  wheel    11M Sep  2 08:12 /var/log/flowd.log.000003
-rw-------  1 root  wheel    11M Sep  2 05:58 /var/log/flowd.log.000004
-rw-------  1 root  wheel    11M Sep  2 03:29 /var/log/flowd.log.000005
-rw-------  1 root  wheel    11M Sep  2 00:58 /var/log/flowd.log.000006
-rw-------  1 root  wheel    11M Sep  1 22:46 /var/log/flowd.log.000007
-rw-------  1 root  wheel    11M Sep  1 20:17 /var/log/flowd.log.000008
-rw-------  1 root  wheel   107M Sep  1 17:56 /var/log/flowd.log.000009
-rw-------  1 root  wheel    11M Aug 31 20:39 /var/log/flowd.log.000010
root@OPNsense:/var/log #


Something strange is going on here! You say that the flowd.log files are not touched or deleted. Compare the above from earlier today with this from now.


root@OPNsense:/var/log # ls -lah /var/log/flowd*
-rw-------  1 root  wheel   8.4M Sep  2 16:03 /var/log/flowd.log
-rw-------  1 root  wheel    11M Sep  2 14:25 /var/log/flowd.log.000001
-rw-------  1 root  wheel    11M Sep  2 12:21 /var/log/flowd.log.000002
-rw-------  1 root  wheel    11M Sep  2 10:25 /var/log/flowd.log.000003
-rw-------  1 root  wheel    11M Sep  2 08:12 /var/log/flowd.log.000004
-rw-------  1 root  wheel    11M Sep  2 05:58 /var/log/flowd.log.000005
-rw-------  1 root  wheel    11M Sep  2 03:29 /var/log/flowd.log.000006
-rw-------  1 root  wheel    11M Sep  2 00:58 /var/log/flowd.log.000007
-rw-------  1 root  wheel    11M Sep  1 22:46 /var/log/flowd.log.000008
-rw-------  1 root  wheel    11M Sep  1 20:17 /var/log/flowd.log.000009
-rw-------  1 root  wheel   107M Sep  1 17:56 /var/log/flowd.log.000010
root@OPNsense:/var/log #


Who rotates my flowd logs?

   /Jonas...


Hmm...

Looks like /usr/local/opnsense/scripts/netflow/flowd_aggregate.py has something to do with this...


...
MAX_FILE_SIZE_MB=10
MAX_LOGS=10
...
def check_rotate():
    """ Checks if flowd log needs to be rotated, if so perform rotate.
        We keep [MAX_LOGS] number of logs containing approx. [MAX_FILE_SIZE_MB] data, the flowd data probably contains
        more detailed data then the stored aggregates.
    :return: None
...


That's why my old data went away when I ditched the SQL DB. Only 10 flowd.log files are kept to not flood the filesystem. The rest of the aggregated data is in the SQL DB, and nowhere else.

I think that solves the mystery.

Comments?

   /Jonas...

Hmmmmm, it looks like I lied. Good question indeed, the log files are rotated by flowd_aggregate so that makes our code the prime suspect.

https://github.com/opnsense/core/blob/master/src/opnsense/scripts/netflow/flowd_aggregate.py#L90

And specifically:

https://github.com/opnsense/core/blob/master/src/opnsense/scripts/netflow/flowd_aggregate.py#L47

That makes a current cap of 100 MB of flowd files, not accounting for glitches like your 107 MB record file.

Mystery solved. Now we need to discuss how to handle this more gracefully in the future...


Ok, so case closed and some leftover work for you guys to decide on how to proceed.

Thanks again for a super awesome piece of software and the same for your time and effort put into supporting it here in the forums.

Now I have to try to find the next anomaly that I can dig deeper into...

   /Jonas...

Hi Jonas,

I've added a question ticket to GitHub to keep track of it: https://github.com/opnsense/core/issues/1170


Thanks for the help,
Franco

same issue on 17.7

Versions   OPNsense 17.7-amd64
FreeBSD 11.0-RELEASE-p11
OpenSSL 1.0.2l 25 May 2017

Aug 24 07:02:57   flowd_aggregate.py: flowd aggregate died with message Traceback (most recent call last): File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 148, in run aggregate_flowd(do_vacuum) File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 79, in aggregate_flowd stream_agg_object.add(flow_record_cpy) File "/usr/local/opnsense/scripts/netflow/lib/aggregates/interface.py", line 70, in add super(FlowInterfaceTotals, self).add(flow) File "/usr/local/opnsense/scripts/netflow/lib/aggregate.py", line 258, in add self._update_cur.execute(self._update_stmt, flow) DatabaseError: database disk image is malformed

should I open a new post?

and also:

Aug 24 09:42:54   configd.py: [215ffaee-e444-4af3-828a-ebe97916bf06] Script action failed with Command '/usr/local/opnsense/scripts/netflow/get_top_usage.py /provider "FlowInterfaceTotals" /start_time "1503550800" /end_time "1503560572" /key_fields "direction" /value_field "packets" /filter "if=igb0" /max_hits "25"' returned non-zero exit status 1 at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 477, in execute stdout=output_stream, stderr=error_stream) File "/usr/local/lib/python2.7/subprocess.py", line 541, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '/usr/local/opnsense/scripts/netflow/get_top_usage.py /provider "FlowInterfaceTotals" /start_time "1503550800" /end_time "1503560572" /key_fields "direction" /value_field "packets" /filter "if=igb0" /max_hits "25"' returned non-zero exit status 1

No, just go to Reporting: Settings and reset the Netflow/Insight data from there.


Cheers,
Franco