After Update to 19.7.2 Service "flowd_aggregate Insight Aggregator" is stopped

spetrillo · August 15, 2019, 05:04:23 PM

Thanks...will validate this later tonight.

spetrillo · August 17, 2019, 04:29:02 PM

Did what you asked and here is the output back to the console:

root@OPNsense:~ # /usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console
Traceback (most recent call last):
File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 224, in <module>
Main()
File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 136, in __init__
self.run()
File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 160, in run
aggregate_flowd(self.config, do_vacuum)
File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 68, in aggregate_flowd
for flow_record in parse_flow(prev_recv, config.flowd_source):
File "/usr/local/opnsense/scripts/netflow/lib/parse.py", line 74, in parse_flow
for flow_record in FlowParser(filename, recv_stamp):
File "/usr/local/opnsense/scripts/netflow/lib/flowparser.py", line 141, in __iter__
record['recv_sec'] = record['recv_time'][0]
KeyError: 'recv_time'

Since it looked somewhat similar I continued with your instructions and installed the patch and then restarted the aggregator service. It did not stay up long.

AdSchellevis · August 17, 2019, 07:17:38 PM

ok, that doesn't look good. maybe the flowd.log file is corrupted and not handled properly on our end.

can you try https://github.com/opnsense/core/commit/d8ef93932b1696edd795ec38be57a2ec3e0187ea?

Code Select


opnsense-patch d8ef9393

unipacket · August 20, 2019, 02:11:40 PM

Thanks AdSchellevis. I ran the same command and got a similar result as spetrillo. I've applied the patch and did a repair Netflow from the web GUI. Will keep an eye on it and see if the service continues to stop or not.

unipacket · August 22, 2019, 05:24:12 PM

@AdSchellevis, Unfortunately no luck. The service stopped again. Is there anything else I can try or provide in terms of logs? I'm not against performing a fresh install to fix the problem but if this is a bug, I would not mind help trying to fix it.

@spetrillo, did you have any luck with the patch?

AdSchellevis · August 22, 2019, 05:26:41 PM

@unipacket : the simplest thing is to keep it running in a console until it crashes and dump the traceback here, usually the log would also contain some info, but a full trace is easier to debug.

spetrillo · August 23, 2019, 05:24:48 PM

Quote from: unipacket on August 22, 2019, 05:24:12 PM
@AdSchellevis, Unfortunately no luck. The service stopped again. Is there anything else I can try or provide in terms of logs? I'm not against performing a fresh install to fix the problem but if this is a bug, I would not mind help trying to fix it.

@spetrillo, did you have any luck with the patch?

No luck here. It's down again. I will work on a trace log also.

unipacket · August 27, 2019, 08:17:01 PM

How do I start the service from the console window? When I run the command

Code Select

/usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console

the output is similar to what spetrillo posted previously. I'm wondering if I should be running a different command? thanks

AdSchellevis · August 27, 2019, 08:34:11 PM

similar might be something different, if it's the exact same, maybe the patch didn't apply properly, in which case you better upgrade tomorrow and try again (19.7.3 is scheduled for tomorrow).

You can always dump the output here, so we can take a look.

unipacket · September 05, 2019, 03:12:50 AM

Upgraded to 19.7.3 and did a "Repair Netflow Data" but still no go. Output from flowd_aggregate.py --console:

Code Select


  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 224, in <module>
    Main()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 136, in __init__
    self.run()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 160, in run
    aggregate_flowd(self.config, do_vacuum)
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 80, in aggregate_flowd
    stream_agg_object.add(copy.copy(flow_record))
  File "/usr/local/opnsense/scripts/netflow/lib/aggregates/ports.py", line 71, in add
    super(FlowDstPortTotals, self).add(flow)
  File "/usr/local/opnsense/scripts/netflow/lib/aggregates/__init__.py", line 185, in add
    self._update_cur.execute(self._update_stmt, flow)

I'm thinking mine might just be plain broke and a reinstall will fix it. Is there anything else I can try before attempting a reinstall?

AdSchellevis · September 05, 2019, 08:37:00 AM

Your output seems incomplete, the relevant parts seem to be missing.
You can always flush all stats in Reporting -> Settings -> Netflow data if you want to remove the current stats and start from scratch.

agh1701 · September 05, 2019, 10:34:00 PM

just upgraded from 19.1.x to 19.7.3 and my service is stopped. I've cleared the data and flowd runs for a couple of days and stops.

unipacket · September 09, 2019, 04:44:17 AM

Thanks :) I'll try flushing stats and see what happens. Will keep you posted.

rainerle · September 09, 2019, 12:14:35 PM

Hi,

same problem here with 19.7.3.

I installed tmux and ran below commands in the tmux session...

Code Select


root@opnsense01:~ # rm /var/netflow/*
root@opnsense01:~ # /usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console
?
Traceback (most recent call last):
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 224, in <module>
    Main()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 136, in __init__
    self.run()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 160, in run
    aggregate_flowd(self.config, do_vacuum)
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 72, in aggregate_flowd
    stream_agg_object.commit()
  File "/usr/local/opnsense/scripts/netflow/lib/aggregates/__init__.py", line 160, in commit
    self._db_connection.commit()
sqlite3.OperationalError: disk I/O error
root@opnsense01:~ #

AdSchellevis · September 09, 2019, 01:11:27 PM

disk full or damaged?

Code Select


df -h

Might help finding the first one. The error itself is likely not related to flowd_aggregate, usually this means it's a victim of hardware related issues.

After Update to 19.7.2 Service "flowd_aggregate Insight Aggregator" is stopped

spetrillo

August 15, 2019, 05:04:23 PM #15

spetrillo

August 17, 2019, 04:29:02 PM #16

AdSchellevis

August 17, 2019, 07:17:38 PM #17

unipacket

August 20, 2019, 02:11:40 PM #18

unipacket

August 22, 2019, 05:24:12 PM #19

AdSchellevis

August 22, 2019, 05:26:41 PM #20

spetrillo

August 23, 2019, 05:24:48 PM #21

unipacket

August 27, 2019, 08:17:01 PM #22

AdSchellevis

August 27, 2019, 08:34:11 PM #23

unipacket

September 05, 2019, 03:12:50 AM #24 Last Edit: September 05, 2019, 03:15:31 AM by unipacket

AdSchellevis

September 05, 2019, 08:37:00 AM #25

agh1701

September 05, 2019, 10:34:00 PM #26 Last Edit: September 06, 2019, 04:03:00 AM by agh1701

unipacket

September 09, 2019, 04:44:17 AM #27

rainerle

September 09, 2019, 12:14:35 PM #28

AdSchellevis

September 09, 2019, 01:11:27 PM #29