Wired memory ramps up until OOM Killer kicks in every 7 days. Reboot. Repeat.

Started by arkanoid, May 15, 2022, 12:25:59 PM

Previous topic - Next topic
As title says, I have to reboot my firewall every ~7 days as the OOM Killer kicks in and kills all processes, leaving only kernel and nothing else (ssh and web interface are killed too)

OPNsense 22.1.2_1-amd64
FreeBSD 13.0-STABLE
OpenSSL 1.1.1m 14 Dec 2021

Please find attached the result of these commands executed with an uptime of 30h

df -h

vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0; cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c} END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print $_, " "} print "\n"' | column -t

vmstat -o | sort -nr | head -n 3000

vmstat -m | sort -rk3


more attachments in the next posts


vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0; cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c} END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print $_, " "} print "\n"' | column -t


Some screenshots over time of

top -o size


first one is just after boot




So let's start with something directly relevant... you have /var MFS enabled, yes?


Cheers,
Franco

I already posted the result of `df -h`

as second confirmation, please find attached a screenshot of the webgui

here's the output of `mount`

/dev/gpt/rootfs on / (ufs, local, noatime, soft-updates)
devfs on /dev (devfs)
devfs on /var/dhcpd/dev (devfs)
devfs on /var/unbound/dev (devfs)

here's the memory load after 53h uptime

last pid: 63460;  load averages:  0.28,  0.39,  0.44  up 2+04:54:04    11:49:44
53 processes:  1 running, 52 sleeping
CPU:  1.1% user,  0.0% nice, 16.3% system,  0.0% interrupt, 82.6% idle
Mem: 39M Active, 989M Inact, 458M Wired, 249M Buf, 2465M Free

Sorry, did not look at attached files / searched with browser find.

At least it's not /var/log overflowing main memory if /var MFS is off.

What's your configuration? Which heavy tasks do you have enabled? OpenVPN, IPsec, Intrusion Detection (without or without IPS?), Unbound blocklists, Web Proxy running, additional plugins...


Cheers,
Franco

Firewall is used just as Wireguard VPN concentrator, so:

OpenVPN: No

IPsec: No

Intrusion Detection (without or without IPS?): No and no

Unbound blocklists: No

Web Proxy running: No

additional plugins...

os-dyndns (installed)   1.27_3   179KiB   OPNsense   Dynamic DNS Support   
os-iperf (installed)   1.0_1   24.6KiB   OPNsense   Connection speed tester   
os-vmware (installed)   1.5_1   610B   OPNsense   VMware tools   
os-wireguard (installed)   1.10   47.1KiB   OPNsense   WireGuard VPN service   
os-zabbix-agent (installed)   1.11   50.1KiB   OPNsense   Zabbix monitoring agent

only os-wireguard is truly essential, in order from top to least important:
wireguard
zabbix-agent
dyndns
iperf3
vmware


top after ~54h
last pid: 35841;  load averages:  0.31,  0.47,  0.41                                                                                                                                       up 2+06:21:38  13:17:18
54 processes:  1 running, 53 sleeping
CPU:  0.0% user,  0.0% nice, 15.8% system,  0.0% interrupt, 84.2% idle
Mem: 80M Active, 998M Inact, 466M Wired, 256M Buf, 2406M Free


Yes, wireguard-kmod   0.0.20211105 installed. But I hardly believe it is the cause of the problem.

I've been facing this problem since early 2021.

Latest change I made to the system is switching from userland wireguard-go implementation to the kernel driver, and this happened at the beginning of march.

While this greatly improved CPU usage, it didn't solve the problem. I used to have OOM and I still have OOM problems.

Please find attached a zabbix graph before and after switching the wireguard implementation.

G.C.


steady rise continues
last pid: 42872;  load averages:  0.59,  0.46,  0.45 up 2+07:23:49  14:19:29
54 processes:  1 running, 53 sleeping
CPU:  0.2% user,  0.0% nice, 10.1% system,  0.0% interrupt, 89.7% idle
Mem: 49M Active, 992M Inact, 476M Wired, 263M Buf, 2432M Free