OPNsense Forum

Archive => 20.1 Legacy Series => Topic started by: XabiX on April 23, 2020, 11:13:20 pm

Title: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: XabiX on April 23, 2020, 11:13:20 pm
Hello Team and Experts,

I am happy to have joined OPNsense since a long time on PFsense !

I was running 20.1.2 without any issue and since the upgrade to 20.1.5 my AMD Ryzen 7 3700X 8-Core Processor (2 cores) are at 100% because of Netflow. I tried removing the interfaces (clear all) to deactivate Netflow but still the same (so I put it back as it was).

Any idea of what can be the issue?
100.00%   /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)


Code: [Select]
ls -lah /var/netflow/*
-rw-r-----  1 root  wheel   3.1M Apr 23 22:41 /var/netflow/dst_port_000300.sqlite
-rw-r-----  1 root  wheel    61K Apr 23 22:41 /var/netflow/dst_port_000300.sqlite-journal
-rw-r-----  1 root  wheel   848K Apr 23 22:41 /var/netflow/dst_port_003600.sqlite
-rw-r-----  1 root  wheel    33K Apr 23 22:41 /var/netflow/dst_port_003600.sqlite-journal
-rw-r-----  1 root  wheel   2.5M Apr 23 22:41 /var/netflow/dst_port_086400.sqlite
-rw-r-----  1 root  wheel    61K Apr 23 22:41 /var/netflow/dst_port_086400.sqlite-journal
-rw-r-----  1 root  wheel   7.1M Apr 23 22:41 /var/netflow/interface_000030.sqlite
-rw-r-----  1 root  wheel    93K Apr 23 22:41 /var/netflow/interface_000030.sqlite-journal
-rw-r-----  1 root  wheel   2.5M Apr 23 22:41 /var/netflow/interface_000300.sqlite
-rw-r-----  1 root  wheel    37K Apr 23 22:41 /var/netflow/interface_000300.sqlite-journal
-rw-r-----  1 root  wheel   680K Apr 23 22:41 /var/netflow/interface_003600.sqlite
-rw-r-----  1 root  wheel    33K Apr 23 22:41 /var/netflow/interface_003600.sqlite-journal
-rw-r-----  1 root  wheel    56K Apr 23 22:41 /var/netflow/interface_086400.sqlite
-rw-r-----  1 root  wheel   8.5K Apr 23 22:41 /var/netflow/interface_086400.sqlite-journal
-rw-r-----  1 root  wheel    12K Apr 23 22:41 /var/netflow/metadata.sqlite
-rw-r-----  1 root  wheel    12M Apr 23 22:41 /var/netflow/src_addr_000300.sqlite
-rw-r-----  1 root  wheel   145K Apr 23 22:41 /var/netflow/src_addr_000300.sqlite-journal
-rw-r-----  1 root  wheel   4.9M Apr 23 22:41 /var/netflow/src_addr_003600.sqlite
-rw-r-----  1 root  wheel    61K Apr 23 22:41 /var/netflow/src_addr_003600.sqlite-journal
-rw-r-----  1 root  wheel    18M Apr 23 22:41 /var/netflow/src_addr_086400.sqlite
-rw-r-----  1 root  wheel   321K Apr 23 22:41 /var/netflow/src_addr_086400.sqlite-journal
-rw-r-----  1 root  wheel    98M Apr 23 22:41 /var/netflow/src_addr_details_086400.sqlite
-rw-r-----  1 root  wheel   1.1M Apr 23 22:41 /var/netflow/src_addr_details_086400.sqlite-journal

Code: [Select]
root@OPNsense:/home/xabix # ls -lah /var/log/flowd*
-rw-------  1 root  wheel    77K Apr 23 22:58 /var/log/flowd.log
-rw-------  1 root  wheel   258M Apr 23 22:56 /var/log/flowd.log.000001
-rw-------  1 root  wheel    10M Apr 20 15:35 /var/log/flowd.log.000002
-rw-------  1 root  wheel    10M Apr 20 13:05 /var/log/flowd.log.000003
-rw-------  1 root  wheel    10M Apr 20 09:55 /var/log/flowd.log.000004
-rw-------  1 root  wheel    10M Apr 20 06:24 /var/log/flowd.log.000005
-rw-------  1 root  wheel    10M Apr 20 02:35 /var/log/flowd.log.000006
-rw-------  1 root  wheel    10M Apr 19 23:00 /var/log/flowd.log.000007
-rw-------  1 root  wheel    10M Apr 19 20:11 /var/log/flowd.log.000008
-rw-------  1 root  wheel    10M Apr 19 16:58 /var/log/flowd.log.000009
-rw-------  1 root  wheel    10M Apr 19 13:46 /var/log/flowd.log.000010

Code: [Select]
root@OPNsense:/home/xabix # df -h
Filesystem         Size    Used   Avail Capacity  Mounted on
/dev/gpt/rootfs     15G    3.1G     10G    23%    /
devfs              1.0K    1.0K      0B   100%    /dev
fdescfs            1.0K    1.0K      0B   100%    /dev/fd
procfs             4.0K    4.0K      0B   100%    /proc
devfs              1.0K    1.0K      0B   100%    /var/dhcpd/dev
devfs              1.0K    1.0K      0B   100%    /var/unbound/dev

I am launching a repair of the Netflow database to see if this fixes something. Anyway, it seems that in the past there were similar issues/patchs depending on the python releases.

Am I the only one facing the issue? Is there a way without reinstalling to reset this netflow part? I assume with a delete the netflow database but would that be enough.

Merci
XabiX
Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: XabiX on April 24, 2020, 11:49:50 am
Looking better this morning  :D

Code: [Select]
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0 32K CPU1 1 81:05 100.00% [idle{idle: cpu1}]
0 root -16 - 0 880K swapin 0 669:39 0.00% [kernel{swapper}]
17217 root 20 0 26M 23M select 1 2:21 0.00% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
57611 root 20 0 2750M 664M nanslp 1 0:38 0.00% /usr/local/bin/suricata -D --netmap --pidfile /var/run/suricata.pid -c /usr/local/etc/suricata/suricata.yaml{suricata}
Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: ladar on October 14, 2020, 10:10:59 am
I'm seeing this same problem. Netflow is pegging a CPU at 100% ... I just rebooted my firewall so I'm wondering if my recent config changes did this, or the problem was there before I didn't notice.

Anyone know if this is a bug in the code, or is netflow simply having trouble keeping up with the traffic volume? My firewall is pushing an average of about a 1 gigabit/sec out to the internet (bursting up to 10 gigs), and that doesn't include internal traffic. So it's possible the volume is simply too much for a single threaded python process to handle. I've noticed the process does periodically drop to idle. But it doesn't stay that way for long (5 to 8 minutes at 100% followed by less than 2 minutes at idle, if I'm guesstimating).

Thoughts?


Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: ladar on October 15, 2020, 04:51:03 pm
Stopping the flowd_aggregate service via the web GUI eliminated the CPU process. After doing so I noticed a file that the /var/log/flowd.log file had grown to be over a gigabyte. Not sure where it was at before I stopped the aggregator process though.

Anyways, I cleared the netflow data via the web GUI, and so far, the process isn't hogging a CPU core anymore.
Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: ladar on October 16, 2020, 12:43:47 am
Clearing the netflow data fixed it for awhile, but eventually the CPU usage returned. For now I'm just going to renice the process.