OPNsense Forum

Archive => 22.1 Legacy Series => Topic started by: mjalafoo on January 13, 2022, 10:02:26 AM

Title: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 13, 2022, 10:02:26 AM
Hi All,

I've been having high CPU load issue since 21.7.4. Happened after the upgrade.
Tried upgrading to 21.7.7, the problem carried over. Last night upgraded to 22.1 beta and the problem carried over too. The situation seems to worsen, as the GUI takes long time to respond, and the SSH session terminates during the login process.

It is worth noting, not all pages in the GUI show slow response. For ex. the login page and the main dashboard takes forever, but configuration pages for Suricata and Firmware update respond much faster.

I checked activities of services, and its not consistent what consumes the CPU load. Once it is python scripts for Suricata, sometimes it is just the php. I stopped Suricata and disabled its configuration.

Any guide how to resolve this?

Note: the unit has been running over a year with the same configuration. Updated and patched consistently. Using J1800 with 4GB RAM. Currently, Squid, Pf, Captive Portal, DHCP, Syslog, OpenVPN, WebGUI are activated. No external plugins installed. Suricata is disabled.
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: franco on January 13, 2022, 10:06:12 AM
22.1 beta is a bit misleading if you are having persistent trouble since 21.7.4. The upgrade indicates the problem lies with the configuration most likely, not the operating system running it.

Is it a VM or hardware?


Cheers,
Franco
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 13, 2022, 10:16:37 AM
It is hardware. 4 Ports Micro Firewall appliance.
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: franco on January 13, 2022, 10:38:17 AM
Maybe it's throttling itself making it seem to use 100% CPU when it doesn't? Have you looked into changing powerd settings?


Cheers,
Franco
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 13, 2022, 11:21:47 AM
Thanks for your reply. But I think there is an issue with the config, as the boot sequence takes longer than 30m to conclude.

Attached is boot sequence snapshots.

How would I change powerd settings?
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: cookiemonster on January 13, 2022, 11:26:58 AM
Potential hardware problems, hard to tell with screenshots. Can you please post the text in quotes instead?
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: franco on January 13, 2022, 11:36:19 AM
Well disk seems damaged for one thing, not sure if beyond repair. The other captures look normal. A broken disk could cause slowness.


Cheers,
Franco
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 13, 2022, 11:47:36 AM
Looks like a hardware problem. I have another box, that I will rebuild using the same config. Then will flash the original box and check if the problem persists.

I will post the updates.
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 15, 2022, 04:39:15 PM
So, I used a fresh box (exact match to the hardware set having the 100% CPU load). Flashed it to 22.1 RC.

After the fresh build, the box behaves normal, reboot is quick, access to WebGUI is with normal response speed.

Loaded the backup configuration from the misbehaving box. The first reboot (after config loading) is taking not less that 20m to complete boot sequence.

It is definitely something to do with the config and not with the hardware. It is also definitely something that surfaced with the recent OS changes.

If anyone can give me access to an older OS than 21.x. I can flash my test box and load the config to check if has the same behavior.

I will also reflash the test box, and build it manually with out loading the config from backup, to check what triggers the CPU load.

Any ideas are welcomed.
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: gpb on January 15, 2022, 04:50:41 PM
I was tracking down some odd boot behavior recently.  I was able to look in /var/log/system/ and scan the system log there to see where the delays were occurring.  (Your screenshots are too small for me to read.)

About the 100% cpu, I've had that happen after upgrading between versions.  It seemed there was a duplicate python task (maybe something like config.py, can't remember for sure) that once KILLed then started behaving.  However, this was for updates that did not require a reboot, so probably unrelated (and a reboot would correct the problem).  You can use something like top in the terminal to see what's going on or I prefer htop...but you have to manually add that via:

pkg add https://pkg.freebsd.org/FreeBSD:13:amd64/quarterly/All/htop-3.1.2.txz
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: Fright on January 15, 2022, 08:56:19 PM
QuoteYou can use something like top in the terminal to see
System: Diagnostics: Activity   ;)
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 16, 2022, 01:15:01 PM
Thanks for the replies all.

In the Diagnostics activity, there seems to be no single items being the culprit in the major loading of the 4 CPUs. Sometimes it is the PHP, or Phython scripts, etc. One thing that is common, is the fact that the top problematic activity contributes to 80/90% of the load on the 4 CPUs.

In the test box today, from the console, I have the following log:
sonewconn: pcb 0xfffff80080bda800 (local:/tmp/php-fastcgi.socket-1): Listen queue overflow: 193 already in queue awaiting acceptance
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: cookiemonster on January 16, 2022, 09:58:27 PM
Can you try disabling Netflow if it's enabled?
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: franco on January 17, 2022, 10:42:11 AM
Quote from: franco on January 13, 2022, 10:38:17 AM
Maybe it's throttling itself making it seem to use 100% CPU when it doesn't? Have you looked into changing powerd settings?

Before chasing ghosts please make sure to set our powerd settings in a way that the system can't throttle its CPU to MAKE IT SEEM that CPU is 100%.  ::)


Cheers,
Franco
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 17, 2022, 11:46:59 AM
So, a little update.

It is not Powerd and not Netflow. Netflow is disabled.

I did analyze the config file, and figured out that IDS alerts are loaded even though Suricata is disabled. The list is huge, and it seems its loading this entire list and churning through it.

So I flashed the test box, and starting loading configuration section by section. The moment I load "OpnSense Additions" the 100% CPU load problem reappear.


I flashed the box one more time, but cleaned the backup config by inserting clean IDS section.

Once rebooted, the OS operates normal and the entire config seems to be intact.

The question remains, why did the IDS config remain in place even though Suricata is disabled. In fact, I tried re-installing Suricata in efforts to remove the residue from the in production box without luck. 
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: franco on January 17, 2022, 12:44:36 PM
Interesting... how big is your config.xml ? Did you manage IDS rules by setting them individually in the past which is now rather done by policies since the number of rule edits could really bog down the system?


Cheers,
Franco
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: mjalafoo on January 17, 2022, 01:11:25 PM
That's exactly what happened  ;D .. Though it was hard to modify and delete the ruleset through the GUI.

The config files was around 6MB and went down to around 500KB after cleaning.

But is this a bug? or a feature? can one ask the system to clear all the IDS ruleset with one command?
Title: Re: 22.1 beta - 100% CPU - How to resolve?
Post by: franco on January 17, 2022, 01:24:49 PM
It's a limitation and a reason why policies were introduced in 21.1. We even added a warning box for it:

https://github.com/opnsense/core/blob/766dc45283f284c0a58345db7a05a431e77cc2ec/src/opnsense/mvc/app/controllers/OPNsense/IDS/Api/SettingsController.php#L767-L777


Cheers,
Franco