High memory usage

Started by dcol, September 21, 2019, 01:07:08 AM

Previous topic - Next topic
September 27, 2019, 08:36:56 PM #15 Last Edit: September 27, 2019, 08:40:22 PM by dcol
Crashed on me twice this morning. This is a full crash. Even the console was dead.

Just before it crashed I got a Monit report that the mem usage was over 90%

When it restarts, all the logs are cleared, so I cannot tell what is causing this.

This is an issue that just started right after I updated to version 19.7.4_1 from 19.7.2.
memory usage all of a sudden pops up to high levels when it crashes. Normally stays at around 10%. I know this because of the Monit email report.

Suricata was disabled.

HELP...........Please............

Try with a new installation,

greeting k0ns0l3

"The quieter you become, the more you are able to hear...."

- OS:Debian GNU/Linux sid
- IPU662 System

Ordered a new mini firewall computer. I will limp along until then. had to restart firewall twice already today.

Is there a way to see the logs before the restart? All of them refresh and clear out on a restart.
I still don't know what is causing this. All I can say is it all started right after the last update.

September 29, 2019, 07:59:40 PM #18 Last Edit: September 29, 2019, 08:01:17 PM by dcol
What did work is I doubled the memory size. Now it doesn't crash. Monit reported up to 18Gb of memory being used. Then a couple minutes later, the memory went down to 2GB.

But the big question is, why is there so much memory being used all of a sudden after the upgrade?

What can I look at to see what is using this memory? Anyone know?

Hi,

You could look at using plain old top :)
Unfortunately bsd does have some quirks when it gets to reporting memory usages, specifically swap so take the swap values with a bag-o-salt. At least that's the conclusion I have come too when looking around the internet.

I'm having similar memory issues where for no reason that I can find the system will just spike and consume a significant amount of memory (virtual+physical). Looking at the usage Squid and Suricata seems to be the culprits, but the stats don't add up. I have also seen some BSD articles which suggests that it could be kernel memory usage, I haven't found any smoking guns. What I have seen is that my system will report close to 2GB swap usages, but any tool I use to break down the swap usage only adds up to about 10MB, which as I understand the docs means the rest must be kernel related processes that have been paged out.

This should give you a sorted list of processes by total size (res, swap, libraries etc etc):
top -S -w -o size
This should give you a sorted list with top res (physical memory)
top -S -w -o res
This should give you a sorted list by swap:
top -S -w -o swap

For now I have limited my Squid memory allocation and I restart Squid/Suricata every so often else I reboot the box :(

R

So with me it looks like that (APU2, OPNsense 19.7.4_1 (amd64/LibreSSL):

top -S -w -o size

last pid: 80838;  load averages:  0.92,  1.01,  0.77                                                       up 1+19:21:10  15:51:51
66 processes:  2 running, 63 sleeping, 1 waiting
CPU: 12.2% user,  0.0% nice,  0.0% system,  7.2% interrupt, 80.6% idle
Mem: 177M Active, 2094M Inact, 537M Wired, 262M Buf, 1108M Free
Swap: 10G Total, 10G Free

  PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME    WCPU COMMAND
58406 root          6  20    0  1479M   415M     0K nanslp  3 138:02  49.46% suricata
70449 root          1  20    0  1038M  5804K     0K select  3   0:28   0.03% ntpd
68356 root          1  20    0  1037M  7156K     0K select  1   0:00   0.06% sshd
61852 root          1  20    0  1037M  5988K     0K select  2   0:00   0.00% sshd
9170 root          2  20    0  1035M  2588K     0K piperd  2   0:00   0.00% sshlockout_pf
96415 root          2  20    0  1035M  2580K     0K uwait   3   0:00   0.00% sshlockout_pf
75399 root          1  20    0  1034M  3916K     0K CPU0    0   0:00   0.21% top
51301 root          1  20    0  1034M  4212K     0K pause   1   0:00   0.00% csh
39989 root          1  52    0  1034M  3000K     0K wait    2   1:08   0.00% sh
29447 root          1  52    0  1034M  2960K     0K wait    3   0:00   0.00% sh
4481 root          1  20    0  1033M  2932K     0K bpf     2   0:36   0.00% filterlog
94040 _dhcp         1  20    0  1033M  3000K     0K select  1   0:00   0.00% dhclient
69056 root          1  52    0  1033M  2892K     0K select  3   0:00   0.00% dhclient
66633 root          1  20    0  1033M  2932K     0K select  0   0:15   0.00% syslogd
80767 root          1  40    0  1033M  2892K     0K nanslp  0   0:07   0.00% cron
63678 root          1  52    0  1033M  2528K     0K ttyin   1   0:00   0.00% getty
58013 root          1  52    0  1033M  2480K     0K piperd  1   0:00   0.00% daemon
  295 root          1  20    0  1033M  2496K     0K select  2   1:02   0.03% powerd
60081 nobody        1  20    0  1033M  2420K     0K sbwait  2   0:02   0.00% samplicate
80781 root          1  52    0  1033M  2392K     0K nanslp  2   0:00   0.00% sleep
76119 clamav        2  20    0   791M   739M     0K select  2  20:27   0.00% clamd


top -S -w -o res

  PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME    WCPU COMMAND
76119 clamav        2  20    0   791M   739M     0K select  2  20:27   0.00% clamd
58406 root          6  20    0  1479M   415M     0K nanslp  1 138:36   2.11% suricata
49032 root          1  52    0 37488K 28708K     0K accept  0   0:01   0.00% php-cgi
34009 root          1  20    0 37360K 28520K     0K accept  2   0:01   0.00% php-cgi
75766 root          1  52    0 39608K 28232K     0K accept  3   0:15   0.00% python3.7
1610 root          1  22    0 37232K 28040K     0K accept  1   0:00   0.00% php-cgi
76069 root          1  23    0 35312K 26652K     0K accept  3   0:02   0.00% php-cgi
20663 root          1  22    0 35440K 26496K     0K accept  1   0:01   0.00% php-cgi
70495 root          1  52    0 37104K 24488K     0K accept  2   0:00   0.00% php-cgi
1013 root          1  20    0 26108K 21516K     0K select  0 636:38   0.02% python3.7
13837 root          1  52    0 27828K 21360K     0K wait    0   0:04   0.00% python3.7
41778 root          1  52    0 34184K 18444K     0K wait    2   0:00   0.00% php-cgi
40821 root          1  52    0 34184K 18436K     0K wait    2   0:00   0.00% php-cgi
29666 clamav        1  20    0 16072K 11108K     0K pause   3   0:52   0.00% freshclam
30040 root          2  20    0 21456K 10552K     0K kqread  2   0:47   0.00% syslog-ng
2281 dhcpd         1  20    0 16584K  9156K     0K select  2   0:01   0.00% dhcpd
28922 root          1  20    0 12296K  8556K     0K kqread  1   0:17   0.01% lighttpd
28086 root          1  52    0 12856K  7492K     0K wait    0   0:00   0.00% syslog-ng


top -S -w -o swap

last pid: 40337;  load averages:  0.46,  0.88,  0.75                                                       up 1+19:23:43  15:54:24
66 processes:  2 running, 63 sleeping, 1 waiting
CPU:  0.7% user,  0.0% nice,  0.3% system,  0.7% interrupt, 98.2% idle
Mem: 176M Active, 2094M Inact, 537M Wired, 262M Buf, 1109M Free
Swap: 10G Total, 10G Free

  PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME    WCPU COMMAND
   11 root          4 155 ki31     0K    64K     0K RUN     0 154.8H 386.09% idle
   12 root         35 -52    -     0K   560K     0K WAIT    0 122:27   3.40% intr
58406 root          6  20    0  1479M   415M     0K nanslp  3 138:37   3.28% suricata
40337 root          1  20    0  1034M  3624K     0K CPU0    0   0:00   0.23% top
   16 root          1 -16    -     0K    16K     0K pftm    0   1:22   0.10% pf purge
64407 root          1  16    -     0K    16K     0K syncer  3   4:33   0.09% syncer
68356 root          1  20    0  1037M  7156K     0K select  2   0:00   0.06% sshd
  295 root          1  20    0  1033M  2496K     0K select  1   1:02   0.04% powerd
1013 root          1  20    0 24060K 20996K     0K select  2 636:47   0.02% python3.7
   17 root          1 -16    -     0K    16K     0K -       2   0:53   0.02% rand_harvestq
70449 root          1  20    0  1038M  5804K     0K select  2   0:28   0.02% ntpd
28922 root          1  20    0 12296K  8556K     0K kqread  3   0:17   0.01% lighttpd
4481 root          1  20    0  1033M  2932K     0K bpf     0   0:36   0.01% filterlog
39847 root          2 -16    -     0K    32K     0K psleep  2   0:09   0.01% bufdaemon
4808 root          3 -16    -     0K    48K     0K psleep  2   0:15   0.00% pagedaemon
    0 root         34 -16    -     0K   544K     0K swapin  0   0:01   0.00% kernel
71960 root          1 -16    -     0K    16K     0K vlruwt  0   0:03   0.00% vnlru
47929 root          1  20    -     0K    16K     0K -       2   0:03   0.00% bufspacedaemon
76119 clamav        2  20    0   791M   739M     0K select  2  20:27   0.00% clamd



greeting k0ns0l3
"The quieter you become, the more you are able to hear...."

- OS:Debian GNU/Linux sid
- IPU662 System

Thanks guys. I will run top in the console the next time it spikes.

But what is interesting is ever since I added memory, I no longer have crashes.

September 30, 2019, 11:16:53 PM #22 Last Edit: September 30, 2019, 11:20:01 PM by dcol
Suricata is the culprit. It gets to a point where it takes all the memory then the Suricata service shuts down and kills the internet.

Something between version 19.7.2 and the current version is causing this because I have not changed the Suricata rules in ages and this started the day I updated.

Is there something different with Suricata?

I did just change the Pattern Matcher from Hyperscan to default and saw the memory drop from 18% to 7% after a reboot. What am I losing by not using Hyperscan?

Been reading earlier posts about somw rulesets that were crashing Suricata. I think this is the case again.
I am not using any of the abuse.ch rulesets

I am only using the following rulesets
ET open/ciarmy
ET open/compromised
ET open/drop
ET open/dshield
ET open/emerging-icmp
ET open/emerging-icmp_info
and my own custom ruleset which is very simple only allowing about 20 ports on one of the WAN IP's.

Ok, I give up Suricata has a problem. The only way I can run stable is to disable Suricata.

Suricata must have some sort of memory leak introduced with a recent update. I have been running OPNsense for over a year with the same rulesets without any issues like this. Now, the memory just eventually blows up. This has to be a bug in the Suricata package.

Afraid it's same conclusion I have come up with.

I've paired down some of the lists Suricata uses, but it's memory usage seems to vary radically with little correlation as to what traffic is causing it.

One thing that we could do it try and identify the leak, but having poked about the net it seems a bit involved. So for now I iretate the living @#$@# out of myself as I hate it when people suggest ...: Restart the service periodically and hope someone comes up with a fix soon. :) 

I also notice that a lot has changed from version 19.1 to 19.7, just one click in the menu takes a few seconds longer  :( Let's wait for patch   ;)

greeting k0ns0l3
"The quieter you become, the more you are able to hear...."

- OS:Debian GNU/Linux sid
- IPU662 System

Thank goodness others are seeing this too. I do not have the luxury of constantly 'rebooting' as I have over a hundred business users with websites and email to answer to. I will just not use Suricata until a patch is released to fix this.