Inactive memory rising rapidly every morning around 3:30am

goobs · February 24, 2025, 10:13:37 AM

I have noticed over the last few weeks, since 24.7 and now 25.1.1 that around 3:18 to 3:38 am every morning the Inactive Memory rises by around 20 %.

Free memory is at 78% after reboot then the next morning it drop so 52%, then the next day 40%, then 30% at which point OOM kicks in killing off Caddy Plugin or Unbound or both.

I have tried disabling IDS and removing most of my blocklists from Crowdsec but the behaviour is the same. With Caddy and IDS off I get a few more days before OOM starts killing things.

I have no cronjobs around this time and have been through every log file available from the GUI but cannot figure out why or what is going on at that time.

The dashboard memory widget happily says I am using 1.2/7GB RAM (I am using a 1GB MFS).

Can anyone point me in the right direction to track down why this could be happening please ?

meyergru · February 24, 2025, 10:26:37 AM

Are you sure about the "no cronjobs" statement? There are four sources for cron jobs in OpnSense:

1. /etc/cron.d/*
2. /etc/crontab
3. /var/cron/tabs/nobody
4. /var/cron/tabs/root

Only the last of which you can see in the web UI.

goobs · February 24, 2025, 12:52:25 PM

Thanks Meyergru I checked them (attached) and nothing jumps out at me ?

Just to add I tried one by one restarting services from the dashboard and this did claw back a 10% increase in free mem but only temporarily.
A reboot of the firewall and free mem went back up to near 80%.

goobs · February 25, 2025, 12:54:05 PM

Inactive memory still on the rise.

What could be using it up and other than rebooting the firewall, how do I get it back ?

Dashboard still says 1.2/7GB in use.

I searched for others with similar memory issues - I am not with Zen Internet and not using IPv6 so that's not it.

goobs · February 25, 2025, 01:08:29 PM

Is it normal to have 43 php-cgi processes running ?

meyergru · February 25, 2025, 01:14:24 PM

I have 43 running as well. But I counted using "ps auxwww | fgrep php-cgi | wc", where you probably only see the first page of "top". If there are more processes, then there could be stalled ones that hog memory. The inactive memory is the difference between "SIZE" and "RES" columns. So it is either many processes building up and never stopping (i.e. hung tasks) or some process(es) that eat up memory with time.

goobs · February 25, 2025, 01:38:41 PM

Yes 43 for me - counted them with ps -faxd | grep php-cgi

The only thing left I can think of doing is disabling or uninstalling CrowdSec, Caddy and Unbound DNS.

I also have a ProofPoint Emerging Threats alias blocklist that updates every 12 hours which I could stop.

I need OpenVPN running to access the firewall from work so can't lose that too!

It would not leave much left for OPNsense firewall to do and I'd lose most of the functionality.

I installed the firewall over three years ago on a HP T730 8GB and have added a Intel I350 2 port NIC and it's been fantastic till the memory issues lately.

meyergru · February 25, 2025, 01:42:40 PM

As I suggested, you should first try to isolate if there are hung tasks (# of processes is rising) or if there are specific processes that build up in size.

ikkeT · February 25, 2025, 09:52:04 PM

I noticed this as well today, as I've monitored a bit memory usage trying to figure out why opnsense runs out of mem every two weeks.

I also notice there are 43 cgi-bins. And problem occurs around this time:

root@OPNsense:~ # grep 3.\*configctl /var/cron/tabs/root
1 3 1 * * (/usr/local/sbin/configctl -d filter schedule bogons) > /dev/null

I wonder what does it do?

See mem graph:

ikkeT · February 25, 2025, 10:06:03 PM

My guess is it just reads lot of files, thus leaving them into memory buffers for quick access until memorybis needed for something else. Hence the jump. But why >40 php-cgi, is that normal?

Normally before the box dies something starts leaking mem and system goes down in half an hour.

meyergru · February 25, 2025, 10:36:34 PM

1. AFAICT, there is a config file for lighttpd that starts 20 CGI workers, which seems normal, but could be less:

Code Select

#### fastcgi module
## read fastcgi.txt for more info
fastcgi.server = ( ".php" =>
  ( "localhost" =>
    (
      "socket" => "/tmp/php-fastcgi.socket",
      "max-procs" => 2,
      "bin-environment" => (
        "PHP_FCGI_CHILDREN" => "20",
        "PHP_FCGI_MAX_REQUESTS" => "100"
      ),
      "bin-path" => "/usr/local/bin/php-cgi"
    )
  )
)

Each of these workers will restart after having serviced 100 requests.

Also, I found that there seem to be ~20 of these workers that have been started when the firewall was last rebooted. Maybe the max-procs starts two master processes.

Whatever, these CGI workers are most likely not the culprit, as they take up only a few KBytes each.

2. The call to "/usr/local/sbin/configctl -d filter schedule bogons" is obviously to fetch the bogon list and update the firewall alias for that. When I called that directly, I saw no apparent jump in memory usage.

FWIW, I see no such behavior here, but YMMV depending on what plugins / tools you use. So it is up to you to look for processes with a big difference in SIZE and RES numbers (or for many similar processes that make up the large numbers).

goobs · February 26, 2025, 01:15:14 PM

I think I might be getting somewhere.

I tried the boguns update mentioned earlier which made no difference to inactive memory so I decided to run from SSH the 'periodic daily' cron task.

I watched as from a freshly booted system the inactive memory climb from 80M to 1200M within a few seconds and stay there.

The prompt took a good 2 minutes to come back then said 'eval: mail: not found'

I cannot see anywhere in the gui to configure mail. I think older releases had it in System/Settings/Notifications but that is not present.

I'm assuming the mail error is the cause of the inactive memory issue here ?

Can anyone point me in the right direction please ?

meyergru · February 26, 2025, 01:53:18 PM

There is lots of jobs that are done within period daily, namely any script that is in /etc/periodic/daily/. There is a job for ZFS scrubbing, for example. This may eat up space on a freshly booted system, but not on the second run - or does it in your case? Also, that is ARC cache, not inactive memory.

The "mail" error is due to the fact that "periodic" output would be mailed to root if the "mail" executable was installed.

cookiemonster · February 26, 2025, 02:00:35 PM

Is this periodic daily actually enabled though? If yes, good candidate to drill into. Then ca you post the contents, to try and see where the mail evaluation is made in the code.
@meyergru I remember looking at these jobs in the past, chaising a different ghost. I do not know if these are actually enabled. For instance 800.scrub-zfs . To my knowledge there is no auto zfs pool scrub out of the box, needs a cron job created. I could be well off the mark but I remember doing this reasoning and moving on.
Indeed my ghost was found somewhere else.

meyergru · February 26, 2025, 02:15:46 PM

Yes, "periodic daily" is enabled in /etc/crontab. It is being run at 3:01am, however, it can only be the culprit for such things if memory use jumps once per day, but there are no processes that stay around afterwards.

As I said, @goobs should look for processes whose memory footprint rises.

Inactive memory rising rapidly every morning around 3:30am

goobs

February 24, 2025, 10:13:37 AM

meyergru

February 24, 2025, 10:26:37 AM #1

goobs

February 24, 2025, 12:52:25 PM #2 Last Edit: February 24, 2025, 12:55:07 PM by goobs

goobs

February 25, 2025, 12:54:05 PM #3

goobs

February 25, 2025, 01:08:29 PM #4

meyergru

February 25, 2025, 01:14:24 PM #5

goobs

February 25, 2025, 01:38:41 PM #6

meyergru

February 25, 2025, 01:42:40 PM #7

ikkeT

February 25, 2025, 09:52:04 PM #8

ikkeT

February 25, 2025, 10:06:03 PM #9

meyergru

February 25, 2025, 10:36:34 PM #10

goobs

February 26, 2025, 01:15:14 PM #11 Last Edit: February 26, 2025, 01:20:24 PM by goobs

meyergru

February 26, 2025, 01:53:18 PM #12

cookiemonster

February 26, 2025, 02:00:35 PM #13

meyergru

February 26, 2025, 02:15:46 PM #14