OPNsense Forum

Archive => 17.7 Legacy Series => Topic started by: opnfwb on January 17, 2018, 08:58:50 pm

Title: issue with disk corruption after power loss/FlowD not starting
Post by: opnfwb on January 17, 2018, 08:58:50 pm
Greetings, unfortunately an extended power loss caused my UPS battery to fully drain and my OPNsense box lost power as a result. The OPNsense box boots up fine after the outage however, I noticed that the NetFlow/Insight graphing feature is no longer working.

I checked the logs and noticed this error:
Code: [Select]
Jan 17 11:34:07 flowd_aggregate.py: flowd aggregate died with message Traceback (most recent call last): File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 148, in run aggregate_flowd(do_vacuum) File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 79, in aggregate_flowd stream_agg_object.add(flow_record_cpy) File "/usr/local/opnsense/scripts/netflow/lib/aggregates/interface.py", line 70, in add super(FlowInterfaceTotals, self).add(flow) File "/usr/local/opnsense/scripts/netflow/lib/aggregate.py", line 260, in add self._update_cur.execute(self._insert_stmt, flow) DatabaseError: database disk image is malformed
Jan 17 11:31:08 configd.py: [fe7f64c6-c9d4-4c36-b09b-e086ab0df1c1] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:31:06 configd.py: [1d50babb-6223-4bc8-a773-971ae9d3a83e] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:30:48 configd.py: [cbd6107d-581c-4bee-9635-ca0ac8756cb0] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:13:09 configd.py: [05bd1a67-c19b-44a7-bea4-85cb83f2064f] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:12:59 configd.py: [a58562cc-04a9-444a-b5cb-0aff086f3b3c] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:12:56 configd.py: [3ce79efa-f2be-4fa6-ba24-7d129adda701] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:12:53 configd.py: [196a3b94-54d7-403a-a91b-541b0b55a882] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:12:51 configd.py: [945799ac-c0d0-4ba3-9ea9-9915654bcc32] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:12:48 configd.py: [11080cc3-73aa-488b-aa4a-35370304ba3c] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:11:53 configd.py: [dd158559-3de8-419c-a2ab-7ab6348d8ad8] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 11:11:27 pkg: flowd reinstalled: 0.9.1_3 -> 0.9.1_3

I have tried re-installing the the FlowD package but this does not fix the issue.

If I click on NetFlow and then click on "Apply" in an attempt to reset NetFlow, I see the following log output:
Code: [Select]
Jan 17 13:56:42 flowd_aggregate.py: flowd aggregate died with message Traceback (most recent call last): File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 148, in run aggregate_flowd(do_vacuum) File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 79, in aggregate_flowd stream_agg_object.add(flow_record_cpy) File "/usr/local/opnsense/scripts/netflow/lib/aggregates/interface.py", line 70, in add super(FlowInterfaceTotals, self).add(flow) File "/usr/local/opnsense/scripts/netflow/lib/aggregate.py", line 260, in add self._update_cur.execute(self._insert_stmt, flow) DatabaseError: database disk image is malformed
Jan 17 13:56:40 configd.py: [263dcf1e-5cc6-47f5-b017-eae4161ad501] request netflow data aggregator top usage for FlowInterfaceTotals
Jan 17 13:56:40 configd.py: [ab58cec3-e55c-41f0-b84c-9f0f30b28be7] request netflow data aggregator metadata
Jan 17 13:56:40 configd.py: [3691858b-b78f-4673-b964-7d9aba0a150f] request netflow data aggregator top usage for FlowDstPortTotals
Jan 17 13:56:40 configd.py: [a555a9e0-31ba-4378-8b2f-4ea3fddaf771] request netflow data aggregator top usage for FlowInterfaceTotals
Jan 17 13:56:40 configd.py: [5300170f-445a-45be-804e-66d765d297fa] request netflow data aggregator top usage for FlowSourceAddrTotals
Jan 17 13:56:40 configd.py: [1f3cc143-1c76-404f-9bf6-5021ef6a211c] request netflow data aggregator timeseries for FlowInterfaceTotals
Jan 17 13:56:36 configd.py: [9fd5aec5-5f19-477a-bcb6-b2df58b6034a] restart netflow data aggregator
Jan 17 13:56:36 configd.py: [42b348f2-e423-4bfc-9d23-7133f065c5ce] request status of netflow collector
Jan 17 13:56:34 configd.py: [c1c38d21-f5d0-4553-9d9e-fa82e5d6bd17] start netflow
Jan 17 13:56:34 configd.py: [cb188c53-dec4-4fa6-a181-235c1a7b52bf] stop netflow

What else can I check or reinstall to get NetFlow working again? This is on OPNsense 17.7.11.
Title: Re: issue with disk corruption after power loss/FlowD not starting
Post by: bartjsmit on January 17, 2018, 09:55:37 pm
Try:

Reporting -> Settings -> Reset Netflow Data

You will use your historical data.

Bart...
Title: Re: issue with disk corruption after power loss/FlowD not starting
Post by: franco on January 17, 2018, 09:59:06 pm
*lose

Otherwise spot on answer by Bart, thanks!

18.1 will have a package health audit in addition to the security update. With this you can see if any packages need a reinstall due to corruption, else reinstalling packages does not help.


Cheers,
Franco
Title: Re: issue with disk corruption after power loss/FlowD not starting
Post by: opnfwb on January 18, 2018, 12:18:34 am
Thank you Bart and Franco. I was hoping I would not have to erase the data.

Performing the steps that Bart outlined has the Insight graphs working again, albeit with no history.

Is there any chance ZFS would have prevented something like this corruption from happening? The other unmentionable firewall that ends with "sense" has ZFS filesystem but, I refuse to use that product. :) Any chance that ZFS is coming to our beloved OPNsense?
Title: Re: issue with disk corruption after power loss/FlowD not starting
Post by: franco on January 18, 2018, 10:42:55 am
In general, the system tries hard to recover data from the database, but UFS can leave zero byte files on the disk after crash, so there will be nothing to recover from.

ZFS would allow snapshots and recovering file states from previous snapshots, yes, although that is a manual task.

ZFS is on our nice-to-have list, but for the most part it is not mission critical and received no help from the community so far in getting there.


Cheers,
Franco