OPNsense Forum

Archive => 21.7 Legacy Series => Topic started by: chemlud on September 22, 2021, 03:59:49 pm

Title: 21.7.3. - high CPU and MEM usage
Post by: chemlud on September 22, 2021, 03:59:49 pm
Hi!

Updated 2 OPNsense to 21.7.3 about an hour ago. No reboot. Had to start syslog_ng manually from the Dashboard afterwards.

Now I get frequent (preconfigured...) warning emails that CPU and MEM usage is very high on both machines with no major throughput.

In Diagnostics - Activity I have

Code: [Select]
OPNsense 1

47519 root 102 0 6352 5414 CPU1 1 69:35 98.78% /usr/local/bin/python3 /usr/local/opnsense/service/configd_ctl.py -e -t 0.5 system event config_changed (python3.8)
36687 root 52 0 27 3836 RUN 1 0:45 43.36% /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid{syslog-ng}

--------------------------

OPNsense 2

92636 root 83 0 32 8724 CPU0 0 1:06 35.60% /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid{syslog-ng}
69153 root 103 0 5091 4559 CPU3 3 68:37 100.00% /usr/local/bin/python3 /usr/local/opnsense/service/configd_ctl.py -e -t 0.5 system event config_changed (python3.8)

Anyone?
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: chemlud on September 22, 2021, 04:13:57 pm
... did a reboot on both machines, looks better now...
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: scream on September 22, 2021, 06:14:14 pm
I do see a similar/same issue.

Upgraded to 27.1.3 and can see very high cpu load. python3.8 consumes around 60-90% CPU all the time. I rebooted the VM but it didn't help.

I reverted to the VM snapshot I did before the upgrade an CPU load is back to normal.


Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dinguz on September 22, 2021, 07:31:46 pm
I had the same issue after upgrade: high cpu usage by a python process. Was gone after a reboot.

Code: [Select]
root    48330 99.4  0.3   33936   22620  -  R    22:51      2:07.71 /usr/local/bin/python3 /usr/local/opnsense/service/configd_ctl.
I have upgraded but not rebooted my other machine which show the same issue, can I do anything helpful in investigating this?

Edit: I see lots of these in the log:

Code: [Select]
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:

syslog-ng has high cpu load now as well.

Edit 2:

On a sidenote: this shows the power of ZFS compression:
Code: [Select]
root@haanjdj:/var/log/system # zfs get compressratio
NAME                PROPERTY       VALUE  SOURCE
zroot               compressratio  8.33x  -
zroot/ROOT          compressratio  2.22x  -
zroot/ROOT/default  compressratio  2.22x  -
zroot/tmp           compressratio  1.66x  -
zroot/usr           compressratio  1.00x  -
zroot/usr/home      compressratio  1.00x  -
zroot/usr/ports     compressratio  1.00x  -
zroot/usr/src       compressratio  1.00x  -
zroot/var           compressratio  27.41x  -
zroot/var/audit     compressratio  1.00x  -
zroot/var/crash     compressratio  1.00x  -
zroot/var/log       compressratio  27.42x  -
zroot/var/mail      compressratio  1.00x  -
zroot/var/tmp       compressratio  1.00x  -
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: almodovaris on September 22, 2021, 07:57:05 pm
True, python was close to 100% CPU (at one core). Rebooted and now is OK.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: Dantichrist on September 22, 2021, 08:09:29 pm
I'm seeing the same issue. I rebooted and the CPU usage has dropped but it is still a good bit higher than what it is normally. The memory usage seems normal.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: 33b on September 22, 2021, 09:22:56 pm
I had the same issue after upgrade: high cpu usage by a python process. Was gone after a reboot.

Code: [Select]
root    48330 99.4  0.3   33936   22620  -  R    22:51      2:07.71 /usr/local/bin/python3 /usr/local/opnsense/service/configd_ctl.
I have upgraded but not rebooted my other machine which show the same issue, can I do anything helpful in investigating this?

Edit: I see lots of these in the log:

Code: [Select]
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
2021-09-22T20:28:12 configctl[48330] event @ 1632335220.47 msg:
syslog-ng has high cpu load now as well.

If you don't reboot, those will fill up your logs pretty quickly, to the tune of over 40 GB per day of logs in my case.

https://forum.opnsense.org/index.php?topic=24148.0

Franco commented there (and also on Reddit this morning) that "Something in the combination of Python and Syslog-ng update of 21.1.9" started causing this on occasion and that they're looking into it.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: MidGe on September 23, 2021, 02:29:10 am
I had the same issue (cpu 100%) after the latest automatic update. A reboot fixed the problem.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: allebone on September 23, 2021, 05:19:38 am
Same for me. 1 core pegged but a reboot fixed it totally for me.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: pavulinux on September 23, 2021, 08:22:50 am
Reboot did not fix the issue in my case. Post upgrade to 21.7.3 Suricata consumes constantly 50%CPU (used to be 5-10% prior to upgrade). Other processes seem to be fine. I've rolled back and the CPU utilisation went back to normal.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: franco on September 23, 2021, 08:34:19 am
Reboot did not fix the issue in my case. Post upgrade to 21.7.3 Suricata consumes constantly 50%CPU (used to be 5-10% prior to upgrade). Other processes seem to be fine. I've rolled back and the CPU utilisation went back to normal.

Err, Suricata was updated from version 5 to 6. This might be unavoidable unless you want to roll back forever.


Cheers,
Franco
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: tong2x on September 23, 2021, 10:49:48 am
confirmed high CPU usage
for me it was syslogNG, dont have suricata installed

reboot helped, cpu usage is not high anymore

though memory usage is abit high
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: opnsenseuser on September 23, 2021, 11:26:32 am
after reboot cpu and mem is normal again
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: yeraycito on September 23, 2021, 12:55:59 pm
I have been formatting opnsense for a while now, every time major versions are released (summer - January). With 21.7 I did it as well. I say this because I just upgraded to 21.7.3 and have not had any problems with high cpu and memory consumption without having to reboot to get it fixed. The same has happened with other problems that have been reported since opnsense 21.7 was released. Everything has worked fine. So I recommend when major versions are released to format Opnsense.

Title: Re: 21.7.3. - high CPU and MEM usage
Post by: chemlud on September 23, 2021, 01:38:25 pm
... I do a fresh install when the underlying OS get's a major update, as expected for 22.1, but not every 6 months. That would be overkill imho...
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: 4r7ur on September 23, 2021, 02:35:50 pm
after reboot cpu and mem seem back to normal. Last fresh install was made in January 2018  :D
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: muchacha_grande on September 23, 2021, 02:45:12 pm
Same here. Reboot needed. Also syslog-ng crashed and fixed after reboot.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: yeraycito on September 23, 2021, 05:38:28 pm
... I do a fresh install when the underlying OS get's a major update, as expected for 22.1, but not every 6 months. That would be overkill imho...

With 21.7 I did it more because of the change to zfs.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: Fright on September 23, 2021, 07:43:08 pm
imho configd_ctl.py ("configd_ctl.py -e -t 0.5 system event config_changed") goes mad on stdin reading when pipe breaks (syslog-ng crash?)
perhaps adding some exception handling would be useful
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: vijvis on September 25, 2021, 02:19:10 am
No high memory or CPU issues on my Protectli vault.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: Olli on September 25, 2021, 03:25:04 pm
Like i wrote in another thread, my problem with high load is the following, reboot did not fix it.

61047 root         85    0    28M    19M CPU0     0   0:06  98.33% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)

After deactivate
Reporting -> Netflow -> Capture local
it runs normal without high load. Maybe some incompatibility with the new python 3.8?
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dinguz on September 25, 2021, 04:43:17 pm
Like i wrote in another thread, my problem with high load is the following, reboot did not fix it.

After deactivate
Reporting -> Netflow -> Capture local
it runs normal without high load. Maybe some incompatibility with the new python 3.8?

I have had this happen in the past after Netflow database corruption. If you reset the data files (Reporting -> Settings) and re-enable Netflow, does the high load come back?
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: Olli on September 25, 2021, 06:06:10 pm

I have had this happen in the past after Netflow database corruption. If you reset the data files (Reporting -> Settings) and re-enable Netflow, does the high load come back?

Thanks. That seems to work.
Maybe the update crashed the database  :)
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: devhunter55 on September 28, 2021, 11:14:10 am
same to me did the upgrade yesterday to  OPNsense 21.7.3_1 - CPU and Memory are close to full (100%).

Syslog-ng Daemon agent was not started.

Check top .. saying .. that sylog-ng and python 3.8 eats all the ressources on the system.

PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
----------------------------------------------------------------------------------------------
74757 root          6  29    0    33M    10M kqread   0   9:57 109.57% syslog-ng
15412 root          1 101    0  1435M  1221M CPU0     0 429:56  97.99% python3.8
90861 root          1 101    0    11M  1916K CPU1     1  15:39  97.89% syslogd

Did not do a reboot till now ..

Seems this firmware release was not properly tested before populate it.




Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dcol on September 28, 2021, 07:42:47 pm
Same issue here. CPU seems to get overworked then the internet comes to a crawl. A reboot fixes it, but this should not be right. I can tell this happens because I use Monit and it alerts me when the CPU is overloaded. I have a simple setup only using Suricata with a few rules and Monit. I don't use netflow, Sensei, bridges, or VPN. I updated to 21.7.3_1 hoping for a fix, but happened again this morning. I don't see anything abnormal in the top logs. It's a mystery.

top shows Suricata averaging at 2%, but python 3.8 pops in at 60-80% every once in a while which correlates to the CPU temps rising.

Also I noticed the temps average 50-55 now. Was 30-35 with 21.7.2
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dinguz on September 28, 2021, 10:52:28 pm
Have you tried to reset RRD and Netflow data (in Reporting: Settings)? I have seen this behavior occasionally with database corruption.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dcol on September 28, 2021, 11:05:02 pm
I have RRD disabled and no interfaces selected in netflow. I do not care about insight data. But to test I turned them on and selected the interface. Issue still remains.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: devhunter55 on September 29, 2021, 08:15:13 pm
did a reboot this morning and now memory-/swap/-cpu consumption is normal.

So - reboot fixes this issue definitively
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dcol on September 29, 2021, 10:40:33 pm
problem is we shouldn't have to reboot to get OPNsense 'back to normal'. These occurrences are due to a problem and needs to be fixed. We didn't use to see this issue.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: franco on September 30, 2021, 04:15:13 pm
https://github.com/opnsense/core/commit/17aec4ed46 hotfix pushed today to address this... thanks to kulikov-a for digging into this


Cheers,
Franco
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dcol on September 30, 2021, 05:23:16 pm
Now when I do an update to get this fix I get 'could not authenticate the selected mirror' due to a certificate verification error
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: Fright on September 30, 2021, 05:42:55 pm
i think le cert chain fixed already
https://www.ssllabs.com/ssltest/analyze.html?d=pkg.opnsense.org&s=89.149.211.205

or you can just change mirror
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dcol on September 30, 2021, 05:44:29 pm
Changing mirror worked
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: Fright on September 30, 2021, 07:56:37 pm
any chance that you added cross-signed ISRG Root X1 cert to Trusted? (thats may be the reason)
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: cookiemonster on September 30, 2021, 07:57:29 pm
I'm sorry for the "me too" post that mixes issues but it might warrant it.
I upgraded today from 21.7.2 to 21.7.3  and rebooted the firewall.
Yesterday I had disabled Netflow by leaving " Listening interfaces" and " WAN interfaces" as "Nothing selected".
Once all services had restarted post-boot, I gave it a few minutes to settle down and checked memory and cpu consumption. The cpu was high for a while.

htop gave me one process to check that looked like interesting.
/usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)
This had a WCPU value of 98%, constant for some 5 minutes.
A couple of ploints: an fstat showed the open file that the script was using was about 15 MB in size.
A quick look at the script I read: that it will rotate files at 10 MB and it's main method will check and repair it's sqlite db and log 'start watching flowd' to syslog.
It was there:
2021-09-30T17:58:32   /flowd_aggregate.py[54199]   vacuum done   
2021-09-30T17:58:31   /flowd_aggregate.py[54199]   vacuum interface_086400.sqlite   
2021-09-30T17:58:31   /flowd_aggregate.py[54199]   vacuum interface_003600.sqlite   
2021-09-30T17:58:31   /flowd_aggregate.py[54199]   vacuum interface_000300.sqlite   
2021-09-30T17:58:31   /flowd_aggregate.py[54199]   vacuum interface_000030.sqlite   
2021-09-30T17:58:15   /flowd_aggregate.py[54199]   vacuum dst_port_086400.sqlite   
2021-09-30T17:58:15   /flowd_aggregate.py[54199]   vacuum dst_port_003600.sqlite   
2021-09-30T17:58:15   /flowd_aggregate.py[54199]   vacuum dst_port_000300.sqlite   
2021-09-30T17:58:07   /flowd_aggregate.py[54199]   vacuum src_addr_086400.sqlite   
2021-09-30T17:58:07   /flowd_aggregate.py[54199]   vacuum src_addr_003600.sqlite   
2021-09-30T17:58:07   /flowd_aggregate.py[54199]   vacuum src_addr_000300.sqlite   
2021-09-30T17:57:48   dhclient[74977]   Creating resolv.conf   
2021-09-30T17:57:39   /flowd_aggregate.py[54199]   vacuum src_addr_details_086400.sqlite   
2021-09-30T17:57:08   /flowd_aggregate.py[54199]   start watching flowd   
2021-09-30T17:56:24   opnsense[53768]   plugins_configure newwanip (execute task : webgui_configure_do(,wan))   
2021-09-30T17:56:24   opnsense[53768]   plugins_configure newwanip (execute task : vxlan_configure_interface())   
2021-09-30T17:56:15   opnsense[53768]   plugins_configure newwanip (execute task : unbound_configure_do(,wan))   
2021-09-30T17:56:15   opnsense[53768]   plugins_configure newwanip (execute task : openssh_configure_do(,wan))   
2021-09-30T17:56:15   opnsense[53768]   plugins_configure newwanip (execute task : opendns_configure_do())   
2021-09-30T17:56:14   opnsense[53768]   plugins_configure newwanip (execute task : ntpd_configure_do())   
2021-09-30T17:56:13   opnsense[53768]   /usr/local/etc/rc.newwanip: Curl error occurred: Resolving timed out after 15001 milliseconds   
2021-09-30T17:56:12   /flowd_aggregate.py[54199]   startup, check database.

I thought it took longer than the 2 mins the logs show but maybe I my recollection is incorrect.

What I see now is that that's the "hoovering" routine that spikes the cpu. At first I thought new python version, needs to maybe convert the data in files or db and was needed as a one-off.
However the script runs on a loop that seems outside the in-script "vacuum_interval = (60*60*8) # 8 hour vacuum cycle"

Question: is it normal that it runs even when Netflow is disabled?

I've not sighup'ed it in case is needed and this is how is meant to be.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: dcol on September 30, 2021, 08:06:06 pm
Getting back to main topic. with 21.7.3_3, I do see slightly reduced CPU usage and lower temps with less CPU spikes. All 4 cores average in the 40C range now. Using Intel i7-6700. Going to change to i5-6600T next week to reduce temps even further.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: cookiemonster on September 30, 2021, 11:18:49 pm
probably my post is too verbose but is on topic.
High cpu usage post upgrade to 21.7.3 and I'm trying to point to what might be a problem. Or rather seeking confirmation if is normal to have flowd_aggregate.py running if there is no netflow enabled.
Sorry if is not clear.
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: Fright on October 01, 2021, 09:34:46 am
@cookiemonster
Quote
if is normal to have flowd_aggregate.py running if there is no netflow enabled
no. should not imho. but script determines your netflow as enabled with local target. try to clear the "Capture local" checkbox and apply
Title: Re: 21.7.3. - high CPU and MEM usage
Post by: cookiemonster on October 01, 2021, 11:02:14 am
Indeed Fright. Clearing that seems to have stopped the python script from running and reducing the cpu usage.
Thank you.