Has anyone else noticed almost double high cpu usage compared from 19.1 to 19.7? As a general rule, my CPU would max about 50% most of the time.. I RARELY saw it peek above that mark, I upgraded to 19.7 this morning and I'm peaking 70-80% frequently. I've even spiked as high as 91%.
I am seeing a change as well.
I also see the same issue, increased work load and also in combination with a very slow WebUI... any ideas?
It also seems traffic is abnormally high too.
Yeah I noticed that the webgui was extremely laggy.
What does System: Diagnostics: Activity or top command say are highest CPU usage processes?
This is what it says currently
PID
STATE
WCPU
COMMAND
11 RUN 61.08% [idle{idle: cpu3}]
21721 CPU3 52.69% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
13223 wait 8.40% /usr/local/bin/python3 /usr/local/opnsense/scripts/systemhealth/activity.py json (python3.7)
53112 piperd 2.98% /usr/local/bin/python3 /usr/local/opnsense/scripts/filter/update_tables.py (python3.7)
yes I'm seeing higher CPI than normal
usually no more than 20% peek but now upto 35% (i7 chip)
netflow is using way more CPU cycles than what I would consider normal. In last release it was using python 2.7, I wonder if new python 3.0 usage is causing issue.
For example here is my System Diagnostic Activity while still on 19.1.10:
11 root 155 ki31 0 64K CPU0 0 835.0H 92.19% [idle{idle: cpu0}]
88347 root 25 0 1919M 325M select 1 135:00 8.89% /usr/local/bin/suricata -D --netmap --pidfile /var/run/suricata.pid -c /usr/local/etc/suricata/suricata.yaml{W#01-igb1}
27807 root 25 0 39M 32M select 1 73:28 6.49% /usr/local/bin/python2.7 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py
18 root -16 - 0 16K - 3 107:38 2.10% [rand_harvestq]
As workaround to reduce CPU usage you could try to turn off NetFlow. I think you turn off netflow by going to Reporting: Netflow and clear all fields and uncheck capture local.
Or
Maybe try to "Reset Netflow Data" or "repair Netflow Data" at Reporting: Settings.
I've already done the reset option thinking it will help resolve the problem. Sadly it did not. I could try turning it off, but that has is open longer-term problem with no data capture.
So I've turned off netflow completely by wiping out all settings under Netflow, reinstalled the package, ran "Reset Netflow Data" or "repair Netflow Data" as well as turned off "Round-Robin-Database" the Webgui seems alot more "snappy" now I'm waiting on it to run to see how my processor load looks.
----######Update
So it was running great never spiked over 51% CPU usage with netflow completely wiped out and not running.. I've turned it back on with everything wiped clean including the package reinstall and currently it's running at attached...
That seems much more normal now. Probably monitor it a few more times and see if it spikes again.
After about 15 hrs of Running it is spiking again it seems...
22876 root 52 0 25M 22M select 1 215:23 35.25% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
Quote from: cguilford on July 19, 2019, 04:18:21 PM
After about 15 hrs of Running it is spiking again it seems...
Same results here. It didn't take 15 hrs, about one hour.
I was going to open an issue on github but seems someone did earlier this morning.
https://github.com/opnsense/core/issues/3587
Even with "netflow off" the CPU usage is still higher than 19.1! I summarized my different trials with the different OPNsense version and I can also confirm with "netflow off" the GUI is reacting faster.
Summary see attachment.
Literally just upgraded to 19.7 in the last hour and the first thing I noticed was CPU has gone through the roof to 99% at idle with minimal traffic. Previously on 19.1 under the same load I would see next to zero CPU usage.
last pid: 71370; load averages: 1.40, 1.37, 1.42 up 0+00:36:46 12:21:45
49 processes: 2 running, 47 sleeping
CPU: 50.2% user, 0.0% nice, 0.2% system, 1.2% interrupt, 48.5% idle
Mem: 142M Active, 123M Inact, 296M Wired, 170M Buf, 1369M Free
Swap:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
35139 root 1 103 0 34996K 29804K CPU1 1 30:40 99.89% python3.7
I think I have alleviated the problem somewhat by resetting NetFlow data. After I did that, CPU usage seems have dropped back to normal, with only the occasional spike from Python.
Also seeing high CPU utilization after upgrading from 19.1.10 to 19.7. As shown in the thread, it appears to be Python/Netflow related.
PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
11 root 155 ki31 0K 64K CPU2 2 8:16 99.15% [idle{idle: cpu2}]
11 root 155 ki31 0K 64K CPU0 0 9:19 89.21% [idle{idle: cpu0}]
11 root 155 ki31 0K 64K RUN 3 8:34 85.92% [idle{idle: cpu3}]
11 root 155 ki31 0K 64K RUN 1 8:12 72.94% [idle{idle: cpu1}]
52874 root 52 0 19736K 14632K piperd 3 0:01 44.24% /usr/local/bin/python3 /usr/local/opnsense/scripts/filte
PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
83957 root 84 0 28848K 25344K CPU2 2 3:34 96.98% /usr/local/bin/python3 /usr/local/opnsense/scripts/netfl
11 root 155 ki31 0K 64K RUN 1 9:54 66.54% [idle{idle: cpu1}]
11 root 155 ki31 0K 64K CPU3 3 10:10 63.93% [idle{idle: cpu3}]
11 root 155 ki31 0K 64K CPU0 0 11:08 51.51% [idle{idle: cpu0}]
11 root 155 ki31 0K 64K RUN 2 9:57 42.72% [idle{idle: cpu2}]
19 root -16 - 0K 16K - 0 0:17 11.31% [rand_harvestq]
12 root -60 - 0K 544K WAIT 1 0:03 1.03% [intr{swi4: clock (0)}]
0 root -92 - 0K 592K - 0 0:02 0.36% [kernel{dummynet}]
36090 root 52 0 51688K 41524K accept 0 0:04 0.34% /usr/local/bin/php-cgi
40440 root 20 0 1034M 4536K CPU1 1 0:00 0.07% top -aSCHIP
I'll try resetting Netflow data and report back. I've also noticed that the web interface is noticeably laggy after the 19.7 upgrade, again probably due to the CPU utilization. This is on a bare metal install, Celeron J3455 quad core, 16GB RAM, and a 120GB SSD. Usually a very snappy system.
I've been running my firewall without local netflow capture enabled since yesterday and the CPU is normal. So then I re-enabled it and let it run for a few hours to get the attached RRD graph. Definitely using more CPU with local netflow enabled.
Still not as bad as before I reset the Netflow data, but definitely more than 19.1.
same situation for me. Firewall stops working and the only solution is a local reboot.
There are some patches available which will come with 19.7.1:
https://github.com/opnsense/core/issues/3587
Just following up on my previous post to provide some extra input. I tried first just repairing netflow data, this did not have an impact in perceived performance and CPU utilization remained high. I then completely reset RRD graphs and netflow data and rebooted the device.
Unfortunately even with these steps I've seen no improvement in page load performance. I can understand that this new version may need more core processing power for NetFlow. What doesn't make sense to me is why the whole page loads are noticeably laggy and slow compared to 19.1.
There are 3 patches listed https://github.com/opnsense/core/issues/3587 with Instructions. Install those and see if it helps. It's made very noticeable difference in my system and performance.
All 3 patches made their way into 19.7.1. It's not perfect and will receive more fine tuning eventually, but for now we will need to focus on other priorities even though the level of CPU use is not what it used to be in 19.1.
Using pure Pyhton 3 instead of Python 2 C bindings does have different levels of processor usage. The main issue is that Python 2 C bindings are already buggy with Clang, unmaintained and about to be deprecated via end of life of Python 2.
Thanks to everybody helping to diagnose this. <3
Cheers,
Franco
Franco: i updated from the 19.1 series to 19.7.3 and also noticed the cpu load...
which is now almost constantly at 95%, seemingly due to suricata and netflow.
(with suricata often logging Error reading data from iface 'pppoe0': (55u) No buffer space available )
both suricata and netflow were already running on 19.1 where i had, maybe, a 10% load (so the cpu load jumped extremely high, even in low-traffic situations)
i dont know what buffer space would be needed, but there is enough free disk space and memory as well as swap space, so that cannot be an issue.
since turning off suricata and netflow is not an option, i was wondering if it is possible to downgrade back to 19.1?
(i would rather stay on an outdated firewall than to disable functions or use -and thus pay- a lot more electricity, since this is a 24/7 appliance)
i currently kill the involved processes (suricata, netflow, syslog-ng) and then have a relatively stable, normal cpu usage for a while... but it seems to return to high usage after some time for no clear reason