OPNsense Forum

Archive => 20.1 Legacy Series => Topic started by: XabiX on April 23, 2020, 11:13:20 pm

Title: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: XabiX on April 23, 2020, 11:13:20 pm: Hello Team and Experts,

I am happy to have joined OPNsense since a long time on PFsense !

I was running 20.1.2 without any issue and since the upgrade to 20.1.5 my AMD Ryzen 7 3700X 8-Core Processor (2 cores) are at 100% because of Netflow. I tried removing the interfaces (clear all) to deactivate Netflow but still the same (so I put it back as it was).

Any idea of what can be the issue?
100.00% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)

Code: [Select]
ls -lah /var/netflow/* -rw-r----- 1 root wheel 3.1M Apr 23 22:41 /var/netflow/dst_port_000300.sqlite -rw-r----- 1 root wheel 61K Apr 23 22:41 /var/netflow/dst_port_000300.sqlite-journal -rw-r----- 1 root wheel 848K Apr 23 22:41 /var/netflow/dst_port_003600.sqlite -rw-r----- 1 root wheel 33K Apr 23 22:41 /var/netflow/dst_port_003600.sqlite-journal -rw-r----- 1 root wheel 2.5M Apr 23 22:41 /var/netflow/dst_port_086400.sqlite -rw-r----- 1 root wheel 61K Apr 23 22:41 /var/netflow/dst_port_086400.sqlite-journal -rw-r----- 1 root wheel 7.1M Apr 23 22:41 /var/netflow/interface_000030.sqlite -rw-r----- 1 root wheel 93K Apr 23 22:41 /var/netflow/interface_000030.sqlite-journal -rw-r----- 1 root wheel 2.5M Apr 23 22:41 /var/netflow/interface_000300.sqlite -rw-r----- 1 root wheel 37K Apr 23 22:41 /var/netflow/interface_000300.sqlite-journal -rw-r----- 1 root wheel 680K Apr 23 22:41 /var/netflow/interface_003600.sqlite -rw-r----- 1 root wheel 33K Apr 23 22:41 /var/netflow/interface_003600.sqlite-journal -rw-r----- 1 root wheel 56K Apr 23 22:41 /var/netflow/interface_086400.sqlite -rw-r----- 1 root wheel 8.5K Apr 23 22:41 /var/netflow/interface_086400.sqlite-journal -rw-r----- 1 root wheel 12K Apr 23 22:41 /var/netflow/metadata.sqlite -rw-r----- 1 root wheel 12M Apr 23 22:41 /var/netflow/src_addr_000300.sqlite -rw-r----- 1 root wheel 145K Apr 23 22:41 /var/netflow/src_addr_000300.sqlite-journal -rw-r----- 1 root wheel 4.9M Apr 23 22:41 /var/netflow/src_addr_003600.sqlite -rw-r----- 1 root wheel 61K Apr 23 22:41 /var/netflow/src_addr_003600.sqlite-journal -rw-r----- 1 root wheel 18M Apr 23 22:41 /var/netflow/src_addr_086400.sqlite -rw-r----- 1 root wheel 321K Apr 23 22:41 /var/netflow/src_addr_086400.sqlite-journal -rw-r----- 1 root wheel 98M Apr 23 22:41 /var/netflow/src_addr_details_086400.sqlite -rw-r----- 1 root wheel 1.1M Apr 23 22:41 /var/netflow/src_addr_details_086400.sqlite-journal
Code: [Select]
root@OPNsense:/home/xabix # ls -lah /var/log/flowd* -rw------- 1 root wheel 77K Apr 23 22:58 /var/log/flowd.log -rw------- 1 root wheel 258M Apr 23 22:56 /var/log/flowd.log.000001 -rw------- 1 root wheel 10M Apr 20 15:35 /var/log/flowd.log.000002 -rw------- 1 root wheel 10M Apr 20 13:05 /var/log/flowd.log.000003 -rw------- 1 root wheel 10M Apr 20 09:55 /var/log/flowd.log.000004 -rw------- 1 root wheel 10M Apr 20 06:24 /var/log/flowd.log.000005 -rw------- 1 root wheel 10M Apr 20 02:35 /var/log/flowd.log.000006 -rw------- 1 root wheel 10M Apr 19 23:00 /var/log/flowd.log.000007 -rw------- 1 root wheel 10M Apr 19 20:11 /var/log/flowd.log.000008 -rw------- 1 root wheel 10M Apr 19 16:58 /var/log/flowd.log.000009 -rw------- 1 root wheel 10M Apr 19 13:46 /var/log/flowd.log.000010
Code: [Select]
root@OPNsense:/home/xabix # df -h Filesystem Size Used Avail Capacity Mounted on /dev/gpt/rootfs 15G 3.1G 10G 23% / devfs 1.0K 1.0K 0B 100% /dev fdescfs 1.0K 1.0K 0B 100% /dev/fd procfs 4.0K 4.0K 0B 100% /proc devfs 1.0K 1.0K 0B 100% /var/dhcpd/dev devfs 1.0K 1.0K 0B 100% /var/unbound/dev
I am launching a repair of the Netflow database to see if this fixes something. Anyway, it seems that in the past there were similar issues/patchs depending on the python releases.

Am I the only one facing the issue? Is there a way without reinstalling to reset this netflow part? I assume with a delete the netflow database but would that be enough.

Merci
XabiX
Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: XabiX on April 24, 2020, 11:49:50 am: Looking better this morning :D

Code: [Select]
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0 32K CPU1 1 81:05 100.00% [idle{idle: cpu1}] 0 root -16 - 0 880K swapin 0 669:39 0.00% [kernel{swapper}] 17217 root 20 0 26M 23M select 1 2:21 0.00% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7) 57611 root 20 0 2750M 664M nanslp 1 0:38 0.00% /usr/local/bin/suricata -D --netmap --pidfile /var/run/suricata.pid -c /usr/local/etc/suricata/suricata.yaml{suricata}
Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: ladar on October 14, 2020, 10:10:59 am: I'm seeing this same problem. Netflow is pegging a CPU at 100% ... I just rebooted my firewall so I'm wondering if my recent config changes did this, or the problem was there before I didn't notice.

Anyone know if this is a bug in the code, or is netflow simply having trouble keeping up with the traffic volume? My firewall is pushing an average of about a 1 gigabit/sec out to the internet (bursting up to 10 gigs), and that doesn't include internal traffic. So it's possible the volume is simply too much for a single threaded python process to handle. I've noticed the process does periodically drop to idle. But it doesn't stay that way for long (5 to 8 minutes at 100% followed by less than 2 minutes at idle, if I'm guesstimating).

Thoughts?
Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: ladar on October 15, 2020, 04:51:03 pm: Stopping the flowd_aggregate service via the web GUI eliminated the CPU process. After doing so I noticed a file that the /var/log/flowd.log file had grown to be over a gigabyte. Not sure where it was at before I stopped the aggregator process though.

Anyways, I cleared the netflow data via the web GUI, and so far, the process isn't hogging a CPU core anymore.
Title: Re: Since upgrade to 20.1.5 high CPU usage because of netflow/flowd_aggregate.py
Post by: ladar on October 16, 2020, 12:43:47 am: Clearing the netflow data fixed it for awhile, but eventually the CPU usage returned. For now I'm just going to renice the process.