Monit alert - Resource limit matched

Started by anym001, September 30, 2024, 07:37:42 AM

Previous topic - Next topic
Hey,

Since the update to 24.7.5 I get a notification from Monit every 2-3 days that the resource limit has been reached. (mem usage > 90%)
After a restart I am back to the usual ~10%.

For your information: I use ZFS as file system, but the ARC cache remains in the usual range.

What is the best way to check what is causing this significant RAM increase?

Start top.
Press "o" for "order".
Type "res" for "resident memory" and ENTER.

The process at the top is the one with the highest memory consumption.

If it's a scripting language like PHP or Python, note the process ID (PID), exit top ("q"), type "ps awwux" and ENTER, look for the process - you should see the full command line, i.e. the name of the script and its parameters.

OTOH - why throw an alarm for 90% memory usage? Free memory is wasted memory. A long running system will always tend to use up all there is.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on September 30, 2024, 09:18:30 AM
Start top.
Press "o" for "order".
Type "res" for "resident memory" and ENTER.

The process at the top is the one with the highest memory consumption.

If it's a scripting language like PHP or Python, note the process ID (PID), exit top ("q"), type "ps awwux" and ENTER, look for the process - you should see the full command line, i.e. the name of the script and its parameters.

Thank you for the explanation.
I will try it out next time.

Quote from: Patrick M. Hausen on September 30, 2024, 09:18:30 AM
OTOH - why throw an alarm for 90% memory usage? Free memory is wasted memory. A long running system will always tend to use up all there is.

In Monit, a warning is stored by default for RAM utilization > 75%.
I have never really thought about this before or have never received this message.

But you're right, a 90% RAM load shouldn't really be a problem.

October 07, 2024, 07:17:06 AM #3 Last Edit: October 07, 2024, 07:25:56 AM by anym001
Quote from: anym001 on September 30, 2024, 10:00:05 AM
Start top.
Press "o" for "order".
Type "res" for "resident memory" and ENTER.

The process at the top is the one with the highest memory consumption.

If it's a scripting language like PHP or Python, note the process ID (PID), exit top ("q"), type "ps awwux" and ENTER, look for the process - you should see the full command line, i.e. the name of the script and its parameters.

Thank you for the explanation.
I will try it out next time.


I have now found out who caused it.
Crowdsec generates an enormous RAM load since the update to 24.7.5 or rather in the included crowdsec update.

The interesting thing is that crowdsec cannot be terminated.
Neither via the GUI, nor with pkill {PID} can crowdsec be terminated. The service continues to run.
Are there any other options here?

Edit:
I have now also tried kill -9 {PID}.
The OPNsense first hung up and then rebooted automatically.

Quote from: anym001 on October 07, 2024, 07:17:06 AM
I have now also tried kill -9 {PID}.
The OPNsense first hung up and then rebooted automatically.
Known issue. Should be fixed now that the update is applied.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Hi, crowdsec maintainer here

First thing, the daemon manager had an issue and ignored the INT signal sometimes, in this case the upgrade tries to stop the service and fails.

root    40599  4.0  6.8 1390784 104344  -  S    09:38   0:01.64 /usr/local/bin/crowdsec -c /usr/local/etc/crowdsec/config.yaml
root    40515  0.0  0.1   12736   2164  -  Ss   09:38   0:00.00 daemon: crowdsec[40599] (daemon)

terminate the second process (kill -9 40515) and upgrade to 1.6.3. This changes the script to send a "stronger" signal to stop the process.

Now to understand why it happened, it would be helpful if you could run "cscli support dump" and send the result to support@crowdsec.net. Let us know if, after the upgrade, you still think crowdsec uses too much cpu or ram. It's not a lightweight process but it should not trigger monitoring.

Thanks

October 07, 2024, 11:37:36 AM #6 Last Edit: October 07, 2024, 11:41:15 AM by anym001
Quote from: mmetc on October 07, 2024, 09:44:55 AM
terminate the second process (kill -9 40515) and upgrade to 1.6.3. This changes the script to send a "stronger" signal to stop the process.

I am already on version 1.6.3_1 since the OPNsense update to 24.7.5

I will run the command "cscli support dump" the next time a RAM problem occurs.
I restarted my OPNsense this morning after entering the post.


October 07, 2024, 04:21:13 PM #8 Last Edit: October 07, 2024, 04:24:38 PM by anym001
Quote from: mmetc on October 07, 2024, 03:34:24 PM
Hi,

could you test this

# fetch -o /usr/local/etc/rc.d/crowdsec https://github.com/crowdsecurity/plugins/releases/download/crowdsec-1.6.3-2-hotfix/crowdsec

and try start/stop.

Thanks

Do I have to use an additional command to install the hotfix?
I suspect that the update did not work. (Screenshots attached)

Quote from: anym001 on October 07, 2024, 04:21:13 PM
Quote from: mmetc on October 07, 2024, 03:34:24 PM
Hi,

could you test this

# fetch -o /usr/local/etc/rc.d/crowdsec https://github.com/crowdsecurity/plugins/releases/download/crowdsec-1.6.3-2-hotfix/crowdsec

and try start/stop.

Thanks

Do I have to use an additional command to install the hotfix?
I suspect that the update did not work. (Screenshots attached)

No it's ok. The fetch command overwrites a script without installing a new package version. Now if you click start/stop from the UI it should just work.

Quote from: mmetc on October 08, 2024, 09:13:08 AM
Quote from: anym001 on October 07, 2024, 04:21:13 PM
Quote from: mmetc on October 07, 2024, 03:34:24 PM
Hi,

could you test this

# fetch -o /usr/local/etc/rc.d/crowdsec https://github.com/crowdsecurity/plugins/releases/download/crowdsec-1.6.3-2-hotfix/crowdsec

and try start/stop.

Thanks

Do I have to use an additional command to install the hotfix?
I suspect that the update did not work. (Screenshots attached)

No it's ok. The fetch command overwrites a script without installing a new package version. Now if you click start/stop from the UI it should just work.
Thank you for the information.

I have noticed that the service can be stopped via the GUI. (Visible because service status is deactivated in the crowdsec overview)
However, the service is displayed as active in the dashboard and in the overview of services.

Quote from: anym001 on October 08, 2024, 10:09:23 AM
Quote from: mmetc on October 08, 2024, 09:13:08 AM
Quote from: anym001 on October 07, 2024, 04:21:13 PM
Quote from: mmetc on October 07, 2024, 03:34:24 PM
Hi,

could you test this

# fetch -o /usr/local/etc/rc.d/crowdsec https://github.com/crowdsecurity/plugins/releases/download/crowdsec-1.6.3-2-hotfix/crowdsec

and try start/stop.

Thanks

Do I have to use an additional command to install the hotfix?
I suspect that the update did not work. (Screenshots attached)

No it's ok. The fetch command overwrites a script without installing a new package version. Now if you click start/stop from the UI it should just work.
Thank you for the information.

I have noticed that the service can be stopped via the GUI. (Visible because service status is deactivated in the crowdsec overview)
However, the service is displayed as active in the dashboard and in the overview of services.

You have orphan crowdsec processes and possibly notification plugins.

"killall crowdsec" and check if there are processes that go by the name "notification-*"

Quote from: mmetc on October 08, 2024, 02:41:17 PM
Quote from: anym001 on October 08, 2024, 10:09:23 AM
Quote from: mmetc on October 08, 2024, 09:13:08 AM
Quote from: anym001 on October 07, 2024, 04:21:13 PM
Quote from: mmetc on October 07, 2024, 03:34:24 PM
Hi,

could you test this

# fetch -o /usr/local/etc/rc.d/crowdsec https://github.com/crowdsecurity/plugins/releases/download/crowdsec-1.6.3-2-hotfix/crowdsec

and try start/stop.

Thanks

Do I have to use an additional command to install the hotfix?
I suspect that the update did not work. (Screenshots attached)

No it's ok. The fetch command overwrites a script without installing a new package version. Now if you click start/stop from the UI it should just work.
Thank you for the information.

I have noticed that the service can be stopped via the GUI. (Visible because service status is deactivated in the crowdsec overview)
However, the service is displayed as active in the dashboard and in the overview of services.

You have orphan crowdsec processes and possibly notification plugins.

"killall crowdsec" and check if there are processes that go by the name "notification-*"
I have executed the command "killall crowdsec".
12834 of 13158 are processes by the name "notification-*".

How can I stop these processes?
Why does this problem occur?

Quote from: anym001 on October 08, 2024, 02:50:35 PM
I have executed the command "killall crowdsec".
12834 of 13158 are processes by the name "notification-*".

How can I stop these processes?

"kill 12834" and keep the most recent.

Quote
Why does this problem occur?

tl;dr my fault, longer version: each freebsd package does service management in a slightly different way: start, stop, restart if error but not too often, reload configuration, coordinate process groups... there is no unified way to express the application's needs, like the - admittedly not universally popular - systemd system in linux. Which means it requires more scripts to manage corner cases, and more room for errors.