OPNsense Forum

Archive => 21.1 Legacy Series => Topic started by: drivera on June 24, 2021, 05:04:38 pm

Title: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: drivera on June 24, 2021, 05:04:38 pm
Hi!

I'm running a Protectli FW4B firewall (8GB RAM, 4-core Celeron J3160, mSATA SSD) that runs smoothly for the most part, until a high-bandwidth download is in play. At that point, and apparently inexplicably, flowd_aggregate.py starts to swallow up the CPU (>80% usage, sometimes up to 100%). I found another thread (https://www.analysisman.com/2020/10/opnsense-highcpu.html (https://www.analysisman.com/2020/10/opnsense-highcpu.html)) where the person found that the fix was to disable IPv6.

My problem is that the fix he prescribes is already applied in my case: I already had IPv6 disabled to begin with.

Is there any way I can help debug why the CPU would be swallowed up in this manner? How can I help diagnose what may be happening here?

Thanks!
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: opnfwb on June 24, 2021, 05:35:13 pm
If you have netflow enabled this is just the result of the schedule job aggregating new stats for you. It doesn't have anything to do with IPv6, it's just aggregating any traffic that the router is passing, including all IPv4.

You can stop this by turning off netflow.

I run a J3455 platform and I've only seen the aggregator use a single CPU thread and it only spikes briefly, usually less than 15 seconds. If you're seeing sustained CPU usage I would suspect something is slowing the ability of the aggregate job to complete its task, perhaps a bottleneck on storage? I'm using a 120GB Sata SSD.
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: drivera on June 24, 2021, 10:04:45 pm
I have an mSATA SSD as well, plenty of RAM, and I still see fairly consistent, sustained (though relatively short-lived) CPU usage.  The problem is that it seems that the job completes just barely fast enough to not make it fully sustaned, b/c it takes just about as long as the invocation frequency.

Question: if I turn off netflow, can I still get the same metrics from the Prometheus plugin?  This is an option for me - to monitor bandwidth usage externally...
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: opnfwb on June 24, 2021, 11:49:25 pm
I don't personally use the Prometheus plugin so I'm unsure if that requires netflow to be enabled.

I'd also check that PowerD is enabled and set to HiAdaptive so that the CPU is scaling to its turbo clock as needed when an intensive task kicks off. Other than that I'm not sure what else to suggest. I see the CPU spikes on mine too but it's just a single thread and it's useful data so I don't turn off netflow. I haven't seen a situation where the task is running for a long time as a result of high bandwidth usage, and this is on a 500/500 connection that gets used pretty heavily. On a system with 4 dedicated cores I kind of consider it a non-issue.

If you just need bandwidth totals vnStat is a great plugin and more lightweight to run than using netflow.
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: drivera on June 24, 2021, 11:54:55 pm
So how does one...

a) check that PowerD is running?
b) set the performance profile?
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: opnfwb on June 24, 2021, 11:59:11 pm
It's a bit buried. I took a screenshot showing where you can check it and set the profile. You could also try the other profiles but I find HiAdaptive works pretty well. It won't hurt to try them all and see if this helps.
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: drivera on June 29, 2021, 03:57:13 am
Thanks!!  It's off in my system so that's not it.

I also turned off the RRD graphing backend and the Prometheus Exporter for good measure, but flowd_aggregate.py still runs and consumes a fair chunk of CPU ... not as much as before, but enough that I'm annoyed it's consuming any CPU at all...

How can I identify what's launching it and why?
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: opnfwb on June 29, 2021, 04:18:18 am
A few things that would be worth mentioning.

1) If PowerD is not enabled, that doesn't necessarily mean your CPU is clocking to the highest speed. BSD has some pretty odd hardware support so I would actually recommend enabling PowerD and see if this allows the processor to turbo boost during high load, single thread scenarios like what we have here. The HiAdaptive profile is very good at these use cases.

2) The FlowD script that uses CPU on my OPNsense install is not due to RRD graphing, but the Netflow collection used for the 'Insight' page under the Reporting section of the OPNsense UI. I've seen where sometimes I have disabled the services (deselect all interfaces, uncheck local logging) and just hitting "apply" doesn't always completely disable it. I've had to reboot to fully get it stopped after I've de-selected all the interfaces on the Netflow config page. So if you haven't already, I would do a reboot after you've done this just to ensure it's fully off.

3) If you want to identify what is launching the process, a quick and dirty way to check is to watch the output of 'top -aSCHIP' in an SSH session. This will show you the full path that is launching the process, and will sort the highest CPU consuming processes on the top. Watch and wait for the flowd process to climb up the list and take a screenshot. It will look something like the screenshot I've posted here (which is a temporary CPU blip that I commonly see with FlowD in my environment, a small spike to 99% and then it drops back down after a few seconds).
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: drivera on June 29, 2021, 04:24:30 am
I'll try #1 and #3, though the top output doesn't show me the parent process - just the process itself. I'll figure something out here...

I thought you might suggest #2 and I already tried it, to no avail ... still gets launched. This sounds like a bug that needs reporting as it shouldn't run once disabled, but still does.

Thanks!
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: opnfwb on June 29, 2021, 04:46:54 am
Hmm, the way I'm reading the output, i think python3 is the parent process. Unfortunately that doesn't tell us exactly which UI setting actually cause it to launch. But the path for the .py script gives a lot of hints.

Perhaps a 'ps -aux | grep python3' would show more? The process with the highest CPU time would be the culprit in that output.
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: drivera on June 29, 2021, 04:54:38 am
Yeah I have the sources for OPNSense and I know how to find where it's referenced. However, I don't know under which circumstances it's launched - and this is what I'm looking for.

That said, I enabled PowerD to HiAdaptive and it seems to be eating less CPU now ... so there's that :D

Also, the system seems to have a swapfile that I wasn't aware was there ... I might remove it later on (not that it's generating pressure ... I just generally dislike swapfiles unless I'm 100% certain I need one ... box has 8GB RAM which should be more than enough).

Cheers!
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: franco on June 29, 2021, 07:52:16 am
At least for the swap file that is opt-in so someone must have enabled it... ;)


Cheers,
Franco
Title: Re: High CPU usage with flowd_aggregate.py ... IPv6 is disabled ... any ideas?
Post by: chemlud on June 29, 2021, 09:00:31 am
Yeah I have the sources for OPNSense and I know how to find where it's referenced. However, I don't know under which circumstances it's launched - and this is what I'm looking for.

That said, I enabled PowerD to HiAdaptive and it seems to be eating less CPU now ... so there's that :D

Also, the system seems to have a swapfile that I wasn't aware was there ... I might remove it later on (not that it's generating pressure ... I just generally dislike swapfiles unless I'm 100% certain I need one ... box has 8GB RAM which should be more than enough).

Cheers!

With linux I have seen strange behaviour on machines with 8GB RAM (and more) when no swap is present (memory management, suddenly all full and machine crashing). So even on machines with large amounts of RAM I now prefer to have a (small) swap.

This (strange RAM memory management) might be also true for BSD, as lately I had my last pfsense (2.5.1) crash after some days with /tmp and /var in RAM. After some days the openVPN tunnel complained of a certificate not readable (solutions on the web not working) and reboot came back with /tmp full 108% (no joke). A re-install on a fresh SSD with totally different hardware gave the exact same symptoms after some days. Only solution: have /tmp on SSD....