Netflow excessive CPU and disk I/O usage

Started by molnart, December 29, 2024, 01:48:27 PM

Previous topic - Next topic
December 29, 2024, 01:48:27 PM Last Edit: December 29, 2024, 01:53:09 PM by molnart
I have repeated issues with netflow, ever since I have first installed OPNsense like 4 year ago.
Usually i am noticing it by system alerts that my ssd temperatures went off the charts. Trying to troubleshoot it, i see netflow (flowd_aggregate.py) producing a disk i/o of around 200 MB/s, also accompanied by high CPU usage.

Restarting the netflow service doesnot help. Restarting OPNsense also does not, disk and CPU usage is the same afterwards. Also for long time I had the feeling that rebooting OPNsense takes ages - not i know its because tar is apparently archiving the netflow files, running for almost 10 minutes.

The only thing that helps is reseting netflow data altogether - i have to do it once a few months. But looking at the contents of /var/netflow the sqlite database is not that big.


--- /var/netflow -------------------------------------------------------------------------------------------------------                                /..
    4.4 GiB [#################]  src_addr_details_086400.sqlite
    1.4 GiB [#####            ]  dst_port_086400.sqlite
    1.2 GiB [####             ]  dst_port_086400.sqlite-journal
  419.1 MiB [#                ]  src_addr_086400.sqlite
  121.8 MiB [                 ]  interface_000030.sqlite
   36.5 MiB [                 ]  src_addr_000300.sqlite
   17.3 MiB [                 ]  dst_port_003600.sqlite
   15.0 MiB [                 ]  dst_port_000300.sqlite
   13.1 MiB [                 ]  interface_000300.sqlite
   13.0 MiB [                 ]  src_addr_003600.sqlite
    1.5 MiB [                 ]  interface_003600.sqlite
  136.0 KiB [                 ]  interface_086400.sqlite
   12.0 KiB [                 ]  metadata.sqlite


I have stumbled upon this thread https://forum.opnsense.org/index.php?topic=19786.0 claiming its caused by IPv6 but that one is disabled in my config.

Is there any longterm solution for this? Like moving netflow data to an external database or something?

Netflow creates a protocol entry of every single connection. On a busy gateway what you observe is just expected. It's a heck of a lot of data, so there is no "solution".

You could set up an external network management system and netflow aggregator and send the data there instead of processing it locally. Most products are commercial, though. I am still investigating if there is any open source tool I can use.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

in that case it looks to me like unoptimized logic on OPNsense side. i think this could be solved by defining data retention periods, downsampling older data, etc. but unfortunately it doesn't look like netflow got any significant development during the past years

Netflow is standardized. It works exactly as designed and there is nothing to change because that would break interoperability. It was invented by Cisco and of course routers were expected to shovel the data off the chassis into some aggregator system with enough storage and processing power.

Possibly the aggregator and monitor built into OPNsense could be improved, agreed. But any serious high bandwidth scenario will need a dedicated machine and product for that, anyway.

There are better protocols today I guess. sFlow seems to be popular.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on December 29, 2024, 02:07:02 PMNetflow creates a protocol entry of every single connection. On a busy gateway what you observe is just expected. It's a heck of a lot of data, so there is no "solution".

You could set up an external network management system and netflow aggregator and send the data there instead of processing it locally. Most products are commercial, though. I am still investigating if there is any open source tool I can use.


I use Elastiflow (renamed to NetObserve). They have a free tier license which is good enough for homelab use.

https://www.elastiflow.com/basic-license
Hardware: Lenovo ThinkStation P330 Tiny (Intel Core i5-8500 @ 3.00GHz, 1xI219-LM, 4xI350)
BUFFERBLOAT GRADE A+

Quote from: jaykumar2005 on January 07, 2025, 02:32:25 PMI use Elastiflow (renamed to NetObserve). They have a free tier license which is good enough for homelab use.
Thanks! I'll give it a spin.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)