Hang up after "failed to reclaim memory" message

Started by G, August 12, 2024, 10:41:14 AM

Previous topic - Next topic
Good morning,
looking for some assistance with this weird issue. My device was working perfectly until a few updates ago (24.1?). Now it has several problems:

  • web interface doesn't start properly, i need to reload all services on each boot
  • Unbound stops working and also needs restart
  • the whole device freezes, not even terminal, last message being    <3>pid 274 (python3.11), jid 0, uid 0, was killed: failed to reclaim memory
I have another instance in the same network working ok. I removed zerotier which seem to extend time between failures. Time of failure is very close to IDS rules updates, but it doesn't fail every day. Looking at the monitoring, memory deeps down to nothing from the usual 2gb free from 4gb.
Any ideas?

Out of memory reaper will kill processes hogging a lot of memory if the system cannot acquire more memory to keep running.

You need to check which processes are hogging the memory and then look into why that is.


Cheers,
Franco

Thanks I'll keep an eye on the processes. Any recommendations about the php issue?

Do you mean Python or PHP? Your first post doesn't mention PHP.


Cheers,
Franco

Sorry if I wasn't clear, when I said the web interface doesn't work on reboot, it looks like a php issue to me. When I reload all services via terminal, it gets stuck on "starting php_fpm" but the web interface loads fine then.



A small fix went into 24.7.1 that seems to occur now that we sped up the boot sequence a little with 24.7 by avoiding to trigger global sysctls all the time during IPv6 initialization. The IPv6 link-local stays in tentative mode so long that when the GUI starts it still is and prevents starting the GUI because lighttpd dies with a hard error (contrary to openssh which just ignores us for the right reasons), which in theory the enabled IPv6 connectivity should fix with a dynamic address acquire hook but never does for whatever reason.

All of this, to this day, because people believe binding the GUI to explicit interfaces (addresses really) is a safer approach. You lose that bet when the address associated is not there or changed or stuck in some IPv6 blue-moon flag such as "tentative".


Cheers,
Franco

We really need one of those UI magic things like on Android phones where you tap 5 times on a section of a menu to enable "Developer mode" or somesuch.
So that for OPN the GUI is enabled by default on all interfaces and non-editable unless the secret 5 clicks or something triggers the edit with plenty of warnings.
Of course people will continue using the new method to align the foot to the gun but is not as easy as it is now.

I added even more locking to web GUI set up for 24.7.2: https://github.com/opnsense/core/commit/74d2a5aa0d

Lighttpd said they have no business locking concurrent starts. I disagree the way they trash their PID file prematurely but it is what it is.


Cheers,
Franco