Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - clvrmnky

#1
I can confirm the exact same behaviour on my own SOHO edge box. I've disabled logging for now and it looks like I won't have to reboot every 7 days or so.

But, I can confirm a perfect ramp of free mem use with no processes really accounting for any of that, and resetting logs had an immediately effect.
#2
Yes, I did have to adjust blocklists early in my investigation. Specifically, one blocklist in particular was causing unbound to stall on startup and eat up lots of free memory. However, once unbound was stable it did not grow anymore. In fact some processes, including the 4 unbound threads, had a smaller memory footprint after 7 days.

It would still be swapping, though. Turns out it is the logger, which was growing unbounded (no pun intended) until the entire system was impacted. I've turned off the logger for now.
#3
As per https://forum.opnsense.org/index.php?topic=27387.msg132879#msg132879 it is the logger that is eating up all the memory.

7 days of uptime requires a restart, or resetting all the logs.

New behaviour since 22.1
#4
Yes. Well beyond 90% and swapping, eventually, as I said in my report. This isn't a case where there was a static bump. It starts at ~18% as always and then continually grows until it has to be manually rebooted.

It is now at 61% since I posted my original report. 4-5% per day or so is typical. It never levels off.

There is a leak in 22.1. I don't know where, but it's there.
#5
See https://forum.opnsense.org/index.php?topic=26867.15 but I did not want to bump that one as there is chatter there related to other issues. But someone observed unbound behaving badly in some cases over there as well.

Since upgrading to 22.1 my simple SOHO OPNsense box has been eating memory slowly and constantly until it starts to swap and requires a reboot. The symptoms are always the same:


  • Starts at some smallish 20%18% (based on the widget in the Lobby)
  • Memory use grows constantly over time with no interaction on my part
  • Eventually memory use spikes to high 90% and starts swapping
  • After about a week I give up and reboot

Memory use is currently at 57% with an uptime of  "4 days, 10:11". Previously it never went over 25%.

Here is the output from "top -o res -a 25":

last pid: 79521;  load averages:  0.42,  0.39,  0.23  up 4+10:04:01    13:03:26
48 processes:  1 running, 47 sleeping
CPU:  0.7% user,  0.0% nice,  0.4% system,  0.2% interrupt, 98.7% idle
Mem: 45M Active, 855M Inact, 755M Wired, 276M Buf, 680M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  451 root          1  52    0   103M    61M accept   0   3:07   0.00% /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.8)
25622 unbound       4  20    0    91M    45M kqread   3   5:31   0.00% /usr/local/sbin/unbound -c /var/unbound/unbound.conf
53754 root          1  20    0    47M    34M accept   2   0:02   1.07% /usr/local/bin/php-cgi
54074 root          1  52    0    47M    34M accept   0   0:03   0.20% /usr/local/bin/php-cgi
50340 root          1  52    0    43M    34M accept   0   0:02   0.00% /usr/local/bin/php-cgi
11311 root          1  52    0    43M    34M accept   0   0:03   0.00% /usr/local/bin/php-cgi
83438 root          1  52    0    43M    33M accept   3   0:02   0.20% /usr/local/bin/php-cgi
47627 root          1  52    0    43M    33M accept   0   0:00   0.00% /usr/local/bin/php-cgi
73866 root          1  20    0    43M    29M wait     2   0:00   0.00% /usr/local/bin/php-cgi
73559 root          1  20    0    43M    29M wait     1   0:00   0.00% /usr/local/bin/php-cgi
88041 www           1  52    0    43M    21M accept   3   0:00   0.00% php-fpm: pool www (php-fpm)
88012 www           1  52    0    43M    21M accept   3   0:00   0.00% php-fpm: pool www (php-fpm)
87894 root          1  52    0    43M    21M accept   3   0:00   0.00% php-fpm: pool webgui (php-fpm)
88002 root          1  52    0    43M    21M accept   1   0:00   0.00% php-fpm: pool webgui (php-fpm)
87876 root          1  20    0    43M    21M kqread   3   0:11   0.00% php-fpm: master process (/usr/local/etc/php-fpm.conf) (php-fpm)
  447 root          1  52    0    30M    19M wait     3   0:02   0.00% /usr/local/bin/python3 /usr/local/opnsense/service/configd.py (python3.8)
26205 root          1  20    0    25M    14M select   0   1:15   0.00% /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain clvrmnky.org (python3.8)
13474 dhcpd         1  20    0    25M    13M select   3   0:02   0.00% /usr/local/sbin/dhcpd -user dhcpd -group dhcpd -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid re0
23790 root          1  20    0    23M    11M select   1   0:44   0.00% /usr/local/bin/python3 /usr/local/opnsense/scripts/syslog/lockout_handler (python3.8)
23095 root          1  20    0    23M    11M select   1   0:49   0.00% /usr/local/bin/python3 /usr/local/opnsense/service/configd_ctl.py -e -t 0.5 system event config_changed (python3.8)
23014 root          4  20    0    42M    11M kqread   3  12:16   0.00% /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid
91128 root          2  20    0    18M  8496K nanslp   1   0:09   0.00% /usr/local/bin/monit -c /usr/local/etc/monitrc
45708 help          1  25    0    18M  7828K select   3   0:00   0.00% sudo su
73312 root          1  20    0    17M  7656K kqread   1   0:18   0.00% /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf
22982 root          1  52    0    19M  7368K wait     2   0:00   0.00% /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid


There are 4 unbound threads, and they just keep growing. I use both blocklists (one small one) and DNS-over-TLS, but this is not new since 22.1. Most other options are default. The CPU time for unbound doesn't look right, either, but I don't recall what that might have been in the past.

All those processes with large RES values will continue to grow, as well. It isn't swapping yet, but it will be by the weekend.

However, even with all that, if I restart unbound memory use doesn't change. In fact, it'll often get worse. I don't know who is holding onto all this memory, but this feels like a classic memory leak somewhere.

The "health" widget shows free memory dropping at a constant rate, but it is unclear where it is going. Perhaps to "inactive".

Prior to 22.1 this exact same hardware and configuration has uptime measured in months. I only took it down to accept scheduled updates requiring a reboot. I literally "set it and forget it" and now I have to visit the admin page few days to know when to take the time to bounce it.
#6
22.1 Legacy Series / Re: Memory getting saturated
February 18, 2022, 04:53:48 AM
Are you using blocklists in unbound? It looks like some blocklists are blowing out memory on each unbound process/thread. Even starting/stopping unbound is painful and error-prone with some blocklists.

And NetFlow , if you have it enabled, also seems to be a memory hog now. Hard to disable it, though. I had to "Clear All" all the settings and reboot, which took a long time because between NetFlow, unbound, and the unbound control processes something was stuck.

I removed one of the blocklists and restarted unbound and I'm back to 40% in the dashboard. unbound processes are HUGE now, but they are smaller with smaller blocklists. And the processes that start/stop them don't spike the CPU.

There is another thread where unbound and blocklists is causing CPU spikes and +20C more heat in some smaller boxes.
#7
22.1 Legacy Series / Re: Memory getting saturated
February 17, 2022, 01:24:56 AM
There isn't any real reason I enabled Netdata. It's not like I need that sort of visibility into things. I might have been using it to inspect IPv6 traffic a bit deeper.

I ended up disabling it right after posting my reply, and didn't see any change in memory use -- maybe a slight drop.

Hours later it's back up to 65-70%92% All unbound, python3, PHP and syslog-ng.

Something different is going on with 22.1.
#8
22.1 Legacy Series / Re: Memory getting saturated
February 16, 2022, 03:01:12 PM
I've noticed a lot of memory being used and not being freed. So far it isn't climbing, but my uptime has only been a few days with 22.1.

What are the top size and res in System: Diagnostics: Activity for you? I see many unbound and netdata threads chewing up lots of memory. Top X threads:

20667   unbound   20   0   1066M   882M   kqread   2   0:00   0.03%   /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
   20667   unbound   20   0   1066M   882M   kqread   0   1:39   0.01%   /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
   20667   unbound   20   0   1066M   882M   kqread   3   0:00   0.00%   /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
   20667   unbound   20   0   1066M   882M   kqread   2   0:00   0.00%   /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
   3127   netdata   40   19   130M   57M   nanslp   2   0:07   0.23%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{PLUGIN[freebsd]}
   3127   netdata   40   19   130M   57M   piperd   3   0:01   0.03%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{PLUGINSD[apps]}
   3127   netdata   40   19   130M   57M   nanslp   2   0:01   0.03%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{PLUGIN[idlejitt}
   3127   netdata   40   19   130M   57M   piperd   1   0:01   0.02%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{PLUGINSD[python}
   3127   netdata   40   19   130M   57M   piperd   0   0:01   0.02%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{PLUGINSD[charts}
   3127   netdata   40   19   130M   57M   nanslp   1   0:00   0.01%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{GLOBAL_STATS}
   3127   netdata   40   19   130M   57M   nanslp   3   0:00   0.01%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{STATSD}
   3127   netdata   40   19   130M   57M   kqread   1   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{DBENGINE}
   3127   netdata   40   19   130M   57M   select   2   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{WEB_SERVER[stat}
   3127   netdata   40   19   130M   57M   select   1   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{WEB_SERVER[stat}
   3127   netdata   40   19   130M   57M   select   1   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{WEB_SERVER[stat}
   3127   netdata   40   19   130M   57M   select   2   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{STATSD_COLLECTO}
   3127   netdata   40   19   130M   57M   select   0   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{WEB_SERVER[stat}
   3127   netdata   52   19   130M   57M   pause   0   0:01   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{netdata}
   3127   netdata   40   19   130M   57M   nanslp   2   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{HEALTH}
   3127   netdata   40   19   130M   57M   nanslp   2   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{PLUGINSD}
   3127   netdata   40   19   130M   57M   nanslp   2   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{SERVICE}
   3127   netdata   40   19   130M   57M   uwait   3   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{netdata}
   3127   netdata   40   19   130M   57M   kqread   0   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{DAEMON_SPAWN}
   3127   netdata   40   19   130M   57M   uwait   0   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{netdata}
   3127   netdata   40   19   130M   57M   uwait   0   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{netdata}
   3127   netdata   40   19   130M   57M   uwait   2   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{netdata}
   3127   netdata   52   19   130M   57M   kqread   2   0:00   0.00%   /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{DAEMON_COMMAND}
   19798   root   52   0   62M   42M   accept   1   0:02   0.00%   /usr/local/bin/php-cgi
   451   root   52   0   72M   39M   accept   0   0:01   0.00%   /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.8){python3.8}
   451   root   52   0   72M   39M   wait   3   0:00   0.00%   /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.8){python3.8}
   43400   netdata   40   19   63M   33M   select   0   0:06   0.19%   /usr/local/bin/python3.8 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1{python3.8}
   43400   netdata   40   19   63M   33M   select   3   0:05   0.02%   /usr/local/bin/python3.8 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1{python3.8}
   1941   root   20   0   48M   32M   select   1   0:52   0.01%   /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)
   77770   root   20   0   53M   31M   select   0   0:03   0.00%   /usr/local/bin/php-cgi
   43499   root   21   0   49M   29M   accept   1   0:02   0.00%   /usr/local/bin/php-cgi
   16758   root   52   0   47M   27M   accept   1   0:02   0.00%   /usr/local/bin/php-cgi
   44900   root   20   0   47M   26M   accept   1   0:02   0.00%   /usr/local/bin/php-cgi
   43818   root   52   0   47M   25M   accept   3   0:02   0.00%   /usr/local/bin/php-cgi