My memory consumption is going up and down daily, from 20% to 90%. (see attachment)
What's the best way to troubleshoot which process is causing this?
Are you using zfs? If yes, zfs arc tries to utilise memory and is normal. Usually only a problem if there are processes that need memory to be freed up faster than arc releases.
That seems like an awful lot of variance to be zfs, but it's hard to tell.
What all do you have running? What plugins are installed?
I have the following plugins:
os-acme-client
os-adguardhome-maxit
os-ddclient
os-homeassistant-maxit
os-theme-rebellion
os-vnstat
os-wireguard
Intrusion detection is enabled also.
I am not sure what zfs is, is it a plugin? to me zfs is a file system.
zfs is a volume manager and filesystem in one. It uses memory dynamically but takes as much as it can and that is expected, normal.
That said, it's probably not the cause of this spikiness like cj wrote.
All I can think of is try to find in the pattern when to observe and run top or htop. They are only live, not for trending.
Currently dashboard shows memory at 84%, consuming over 6GB of memory:
84 % ( 6888/8131 MB ) { ARC size 5670 MB }
but top tells me "5839M Free". Unless I'm mistaken?
root@sense:/var/log # top
last pid: 958; load averages: 2.79, 2.46, 2.30 up 1+07:46:53 04:34:24
64 processes: 2 running, 62 sleeping
CPU: 21.6% user, 0.0% nice, 19.9% system, 1.1% interrupt, 57.5% idle
Mem: 144M Active, 651M Inact, 4724K Laundry, 1254M Wired, 2056K Buf, 5839M Free
ARC: 639M Total, 29M MFU, 540M MRU, 7511K Anon, 3027K Header, 59M Other
513M Compressed, 563M Uncompressed, 1.10:1 Ratio
Swap: 8418M Total, 8418M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
69587 root 12 20 0 869M 213M uwait 3 8:51 4.62% AdGuardHome
31921 root 8 22 0 182M 106M kqread 2 0:03 0.00% python3.9
56389 root 7 20 0 176M 114M nanslp 3 33:54 2.85% suricata
30363 unbound 4 20 0 111M 41M kqread 1 0:01 0.00% unbound
256 root 1 52 0 109M 58M accept 3 17:15 0.89% python3.9
8197 root 1 20 0 58M 32M accept 3 0:04 1.51% php-cgi
28719 root 1 52 0 58M 31M accept 0 0:04 0.00% php-cgi
93364 root 1 52 0 58M 30M accept 1 0:02 0.00% php-cgi
89757 root 1 52 0 58M 29M accept 3 0:00 0.50% php-cgi
91927 root 1 52 0 58M 29M accept 0 0:00 0.00% php-cgi
4396 root 6 21 0 54M 14M kqread 0 18.2H 56.32% syslog-ng
17222 root 1 20 0 54M 24M wait 2 0:01 0.00% php-cgi
75764 root 1 52 0 52M 31M accept 2 0:06 0.00% php-cgi
17186 root 1 20 0 48M 24M wait 1 0:00 0.00% php-cgi
65908 root 1 20 0 48M 30M select 1 555:06 0.01% python3.9
68422 root 1 20 0 39M 28M nanslp 0 0:10 0.00% perl
7733 dhcpd 1 20 0 25M 11M select 0 0:00 0.00% dhcpd
252 root 1 52 0 24M 13M wait 2 0:03 0.00% python3.9
82663 root 1 20 0 23M 12M select 1 0:03 0.03% python3.9
82649 root 1 20 0 23M 12M select 1 0:03 0.03% python3.9
25582 dhcpd 1 20 0 22M 9312K select 1 0:07 0.00% dhcpd
4259 root 1 52 0 21M 8032K wait 1 0:00 0.00% syslog-ng
89560 root 1 20 0 21M 6804K select 3 0:27 0.03% ntpd
17086 root 1 20 0 21M 9908K kqread 2 6:58 0.07% lighttpd
64380 root 2 20 0 18M 6432K nanslp 2 0:05 0.00% monit
17051 cyprien 1 20 0 18M 7872K select 1 0:00 0.02% sshd
14639 root 1 21 0 18M 7568K select 2 0:00 0.00% sshd
16122 root 1 20 0 18M 6748K select 3 0:00 0.00% sshd
53597 vnstat 1 20 0 15M 5168K nanslp 3 0:12 0.00% vnstatd
66413 root 1 25 0 14M 4020K piperd 2 3:33 0.30% bash
90011 root 1 20 0 14M 3996K CPU2 2 0:00 0.14% top
94261 root 1 20 0 14M 4048K kqread 3 0:01 0.00% lighttpd
54965 root 1 20 0 13M 3852K pause 0 0:00 0.00% csh
97710 root 1 24 0 13M 2944K wait 1 0:36 0.01% sh
17058 cyprien 1 20 0 13M 3428K wait 0 0:00 0.00% sh
51767 root 1 52 0 13M 3196K wait 3 0:00 0.00% sh
25912 root 1 20 0 13M 3056K wait 3 0:01 0.00% sh
45202 cyprien 1 24 0 13M 2948K wait 3 0:00 0.00% su
91245 _dhcp 1 20 0 13M 2792K select 1 0:06 0.02% dhclient
85398 root 1 4 0 13M 2744K select 3 0:00 0.00% dhclient
85031 root 1 20 0 13M 2664K select 0 0:00 0.01% dhclient
33799 root 1 87 0 13M 3220K CPU3 3 310:17 52.26% filterlog
91867 root 1 20 0 13M 2536K kqread 0 0:37 0.03% rtsold
92454 root 1 20 0 13M 2596K select 0 0:26 0.02% rtsold
92282 root 1 26 0 13M 2492K select 3 0:00 0.00% rtsold
92284 root 1 52 0 13M 2488K select 3 0:00 0.00% rtsold
92311 root 1 23 0 13M 2480K select 2 0:00 0.00% rtsold
99757 root 1 30 0 13M 2580K nanslp 2 0:02 0.00% cron
13623 root 1 52 0 12M 2500K select 0 0:00 0.00% dhcp6c
6935 root 1 52 0 12M 2312K ttyin 2 0:00 0.00% getty
31883 root 1 52 0 12M 2264K piperd 0 0:00 0.00% daemon
63237 root 1 52 0 12M 2264K piperd 1 0:00 0.00% daemon
69359 root 1 20 0 12M 2260K piperd 2 0:00 0.00% daemon
25740 root 1 20 0 12M 2260K piperd 3 0:00 0.00% daemon
97600 root 1 20 0 12M 2256K piperd 3 0:01 0.00% daemon
66976 _flowd 1 20 0 12M 2668K select 1 0:47 0.00% flowd
66951 root 1 20 0 12M 2420K sbwait 3 0:00 0.00% flowd
38535 root 1 20 0 12M 2276K select 2 0:26 0.03% powerd
20831 root 1 20 0 12M 2444K select 3 0:13 0.00% radvd
66622 root 1 20 0 12M 2284K sbwait 1 0:33 0.05% route
63345 nobody 1 20 0 12M 2172K sbwait 3 0:17 0.00% samplicate
95617 root 1 20 0 12M 2140K nanslp 0 0:00 0.00% sleep
958 root 1 24 0 12M 2140K nanslp 2 0:00 0.00% sleep
im running the same plugins and a bit more and use around the same 6G of RAM, no issues for me.
Maybe it's a cosmetic issue and the dashboard reports the wrong memory usage.
Right now top says I've 2958MB free whereas the dashboard says I have 3742MB free.
But anyway, both are varying quite a bit...
Odd that you're seeing differences. Top and my dashboard both show the same for free and arc size. Also, my memory usage is relatively steady.
Can you try temporarily turning off different things to see if it steadies out? I'd try IDS/IPS first as that's probably the least intrusive to your setup to disable.
ha. This morning I disabled logging on 4 firewall rules (deny rules for outgoing DNS queries) and suddenly it stopped. I checked the config history, that's the only change I made at this time.
Still don't get how these 4 rules can consume so much memory...
Some things get rather upset when you block DNS queries and therefore ramp up the amount of queries. That could be what's happening.
It also could be something else in your firewall config and it temporarily went away because you restarted the firewall.
Quote from: CJ on August 24, 2023, 01:21:58 PM
Some things get rather upset when you block DNS queries and therefore ramp up the amount of queries. That could be what's happening.
It also could be something else in your firewall config and it temporarily went away because you restarted the firewall.
Especially roku devices.
Quote from: CJ on August 24, 2023, 01:21:58 PM
Some things get rather upset when you block DNS queries and therefore ramp up the amount of queries. That could be what's happening.
yep. Home Assistant sending 1-2Mbps (!) of DNS over TLS attempts to Cloudflare. Changing the fw rules from "reject" to "block" caused HA to stop basically DoS'ing opnsense.
Quote from: 9axqe on August 25, 2023, 09:54:50 PM
Quote from: CJ on August 24, 2023, 01:21:58 PM
Some things get rather upset when you block DNS queries and therefore ramp up the amount of queries. That could be what's happening.
yep. Home Assistant sending 1-2Mbps (!) of DNS over TLS attempts to Cloudflare. Changing the fw rules from "reject" to "block" caused HA to stop basically DoS'ing opnsense.
Interesting. I had changed mine from Block to Reject some time ago but I don't recall anything particularly different occurring. Although I haven't gotten around to trying out HA yet. I'll have to keep an eye out.
To be fair, Home Assistant didn't entirely stopped trying since I switch to "block", but it stopped generating Mbps of traffic at least. It seems the "reject" got him in a loop.
I still don't know why it was doing this. I have the cloudflared plugin and thought it was the reason, but disabling the plugin doesn't stop this lookups from happening. Now that I think about it, I should have restarted HA on top of disabling the plugin probably...
Quote from: 9axqe on August 27, 2023, 04:22:52 PM
To be fair, Home Assistant didn't entirely stopped trying since I switch to "block", but it stopped generating Mbps of traffic at least. It seems the "reject" got him in a loop.
I still don't know why it was doing this. I have the cloudflared plugin and thought it was the reason, but disabling the plugin doesn't stop this lookups from happening. Now that I think about it, I should have restarted HA on top of disabling the plugin probably...
I would wager that HA is still in the same loop. It's just that Reject provides immediate feedback that the query failed while Block has to wait for the timeout to occur.