After Update to 19.7.2 Service "flowd_aggregate Insight Aggregator" is stopped

Started by thowe, August 05, 2019, 07:15:34 PM

Previous topic - Next topic

Did what you asked and here is the output back to the console:

root@OPNsense:~ # /usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console
Traceback (most recent call last):
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 224, in <module>
    Main()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 136, in __init__
    self.run()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 160, in run
    aggregate_flowd(self.config, do_vacuum)
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 68, in aggregate_flowd
    for flow_record in parse_flow(prev_recv, config.flowd_source):
  File "/usr/local/opnsense/scripts/netflow/lib/parse.py", line 74, in parse_flow
    for flow_record in FlowParser(filename, recv_stamp):
  File "/usr/local/opnsense/scripts/netflow/lib/flowparser.py", line 141, in __iter__
    record['recv_sec'] = record['recv_time'][0]
KeyError: 'recv_time'

Since it looked somewhat similar I continued with your instructions and installed the patch and then restarted the aggregator service. It did not stay up long.

ok, that doesn't look good. maybe the flowd.log file is corrupted and not handled properly on our end.

can you try https://github.com/opnsense/core/commit/d8ef93932b1696edd795ec38be57a2ec3e0187ea?



opnsense-patch d8ef9393

Thanks AdSchellevis.  I ran the same command and got a similar result as spetrillo.  I've applied the patch and did a repair Netflow from the web GUI.  Will keep an eye on it and see if the service continues to stop or not.


@AdSchellevis, Unfortunately no luck.  The service stopped again.  Is there anything else I can try or provide in terms of logs?  I'm not against performing a fresh install to fix the problem but if this is a bug, I would not mind help trying to fix it.

@spetrillo, did you have any luck with the patch? 

@unipacket : the simplest thing is to keep it running in a console until it crashes and dump the traceback here, usually the log would also contain some info, but a full trace is easier to debug.

Quote from: unipacket on August 22, 2019, 05:24:12 PM
@AdSchellevis, Unfortunately no luck.  The service stopped again.  Is there anything else I can try or provide in terms of logs?  I'm not against performing a fresh install to fix the problem but if this is a bug, I would not mind help trying to fix it.

@spetrillo, did you have any luck with the patch?

No luck here. It's down again. I will work on a trace log also.

How do I start the service from the console window?  When I run the command

/usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console

the output is similar to what spetrillo posted previously.  I'm wondering if I should be running a different command?  thanks

similar might be something different, if it's the exact same, maybe the patch didn't apply properly, in which case you better upgrade tomorrow and try again (19.7.3 is scheduled for tomorrow).

You can always dump the output here, so we can take a look.

Upgraded to 19.7.3 and did a "Repair Netflow Data" but still no go.  Output from flowd_aggregate.py --console:


  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 224, in <module>
    Main()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 136, in __init__
    self.run()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 160, in run
    aggregate_flowd(self.config, do_vacuum)
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 80, in aggregate_flowd
    stream_agg_object.add(copy.copy(flow_record))
  File "/usr/local/opnsense/scripts/netflow/lib/aggregates/ports.py", line 71, in add
    super(FlowDstPortTotals, self).add(flow)
  File "/usr/local/opnsense/scripts/netflow/lib/aggregates/__init__.py", line 185, in add
    self._update_cur.execute(self._update_stmt, flow)


I'm thinking mine might just be plain broke and a reinstall will fix it.  Is there anything else I can try before attempting a reinstall?

Your output seems incomplete, the relevant parts seem to be missing.
You can always flush all stats in Reporting -> Settings -> Netflow data if you want to remove the current stats and start from scratch.

just upgraded from 19.1.x to 19.7.3 and my service is stopped.  I've cleared the data and flowd runs for a couple of days and stops.

Thanks :)  I'll try flushing stats and see what happens.  Will keep you posted.

Hi,

same problem here with 19.7.3.

I installed tmux and ran below commands in the tmux session...


root@opnsense01:~ # rm /var/netflow/*
root@opnsense01:~ # /usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console
?
Traceback (most recent call last):
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 224, in <module>
    Main()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 136, in __init__
    self.run()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 160, in run
    aggregate_flowd(self.config, do_vacuum)
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 72, in aggregate_flowd
    stream_agg_object.commit()
  File "/usr/local/opnsense/scripts/netflow/lib/aggregates/__init__.py", line 160, in commit
    self._db_connection.commit()
sqlite3.OperationalError: disk I/O error
root@opnsense01:~ #



disk full or damaged?

df -h

Might help finding the first one. The error itself is likely not related to flowd_aggregate, usually this means it's a victim of hardware related issues.