After Update to 19.7.2 Service "flowd_aggregate Insight Aggregator" is stopped

Started by thowe, August 05, 2019, 07:15:34 PM

Previous topic - Next topic
Hello

Today i updated from 19.7.1 to 19.7.2 on my APU2c4. Update itself run without problems.

But after the update the Service "flowd_aggregate" was stopped and could not be started again.

In the menu "Reporting: NetFlow" I could not reapply the current settings there, as an error stated that the WAN interface was missing in Listening Interfaces (it really was missing).

After manually re-adding the WAN interface there, I could apply the settings. In the dashboard the Service "flowd_aggregate" was shown green/running. But after a refresh of the dashboard, the service is showed as stopped again.

In the general log I found:
Quote
/flowd_aggregate.py: flowd aggregate died with message Traceback (most recent call last): File /usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 160, in run aggregate_flowd(self.config, do_vacuum) File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 80, in aggregate_flowd stream_agg_object.add(copy.copy(flow_record)) File "/usr/local/opnsense/scripts/netflow/lib/aggregates/source.py", line 117, in add super(FlowSourceAddrDetails, self).add(flow) File "/usr/local/opnsense/scripts/netflow/lib/aggregates/__init__.py", line 185, in add self._update_cur.execute(self._update_stmt, flow) sqlite3.DatabaseError: database disk image is malformed

What caused this issue?
What is the best thing to resolve this issue?


Thanks!

Tom
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC

Try the buttons "Repair Netflow Data" or it this doesn't work "Reset Netflow Data" in Reporting: Settings

In "Reporting: Settings" the point "Repair Netflow Data" did the trick. Now ok.

I was not aware of the possibility. Thanks! :-)
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC

This is still an issue for me. I have reset and also repaired but the Aggregator service does not stay active. Anyone else seeing this in 19.7.2?

@spetrillo 

Possibly.  For the past couple days, I've been having issues with Insight not working with VLANs (https://forum.opnsense.org/index.php?topic=13707.0).  I saw your post today and when I logged into OPNsense, also found the flowd_aggregate service stopped.  I had upgraded to 19.7.2. yesterday and tried the Repair Netflow Data option yesterday.  After reading thowe's comment, I tried it again today since the service was stopped and, at least for the past hour, the service has been running and, to my surprise, reports seem to be working agin.  Going to keep an eye on it though to be sure.

I def think I have got something going on here. I repaired the database but it is returning a return code of 1 when complete, and the daemon never starts again.

I agree.  Checked today and found the flowd_aggregate service stopped.  Also seeing similar errors.  Not sure what could be going on.

Is anyone else having an issue with keeping the Aggregator running?


Seeing the same thing here.  As a test, I did a fresh install of OPNsense in a VM and the flowd_aggregate has been running for days.  I'm wondering if it is something with my config. 

Hmmm....thats an interesting thought. I am going to do a clean install of 19.7.2 and see if that changes things.


Did you do a clean install of 19.7 or was it on top of 19.1? Originally I did an install on top of 19.1, which I think caused my issue. I did a clean install of 19.7 last night and the aggregator is staying up now.

Well I hoped that it was a clean build that would cure it but the service is stopped again and I cannot get it started again, no matter a repair or reset. This is definitely a problem.

Can any of the devs chime in here? Is this a bug?

Can you execute the following on a console?


/usr/local/opnsense/scripts/netflow/flowd_aggregate.py --console


If it fails with something like:



Traceback (most recent call last):
  File "/usr/local/opnsense/site-python/sqlite3_helper.py", line 60, in check_and_repair
    cur.execute('analyze')
sqlite3.DatabaseError: database disk image is malformed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 224, in <module>
    Main()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 136, in __init__
    self.run()
  File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 144, in run
    check_and_repair('%s/*.sqlite' % self.config.database_dir)
  File "/usr/local/opnsense/site-python/sqlite3_helper.py", line 62, in check_and_repair
    if e.find('malformed') > -1 or force_repair:
AttributeError: 'DatabaseError' object has no attribute 'find'


It's a corrupted database combined with a bug trying to repair it.

Should be fixed with (on OPNsense 19.7.2):

opnsense-patch e5574648
service flowd_aggregate restart


Best regards,

Ad