I have been testing OPNsense in a virtualised environment (KVM) with modest resources (10GB disk, 4GB memory, 4 cores). The logic being it if behaves well here, it'll work just fine with more resources.
When investigating how OPNsense handles writing logs to disk (https://forum.opnsense.org/index.php?topic=23649.msg112820#msg112820), I enabled the '/var RAM disk' option in 'System: Settings: Miscellaneous'. With conservative logging options set in 'System: Settings: Logging' ('Disable circular logs': checked, 'Preserve logs (Days)': 5) and only basic services running on the system (pf, dhcp, unbound DNS), this worked just fine.
The problem came with enabling more services. Specifically NetFlow (required by 'Reporting: Insight'), ntopng (installed as a plugin) and redis (a required dependency of ntopng). Everything would work fine for a while, but certain aspects of the system would become unresponsive after some time had passed. Most noticeably unbound refused to respond to queries. The services widget on the dashboard would show unbound and flowd_aggregate as stopped. Manually restarting them again would work for a few minutes, but eventually they would stop again. The dashboard also showed high memory usage and disk usage on `
/var`; not surprising given the additional services running on the system.
The only place that provided a hint as to what was happening was the OPNsense console: the system was running out of memory and killing processes. Unfortunately I didn't think to save the exact error messages at the time. (Does OPNsense regularly save `
dmesg` output to a file?)
NetFilter, ntopng and redis obviously require memory to operate. The compounding factor is those services persisting a significant amount of data to memory-backed storage. In the case of NetFilter, it's actually a spiral of doom: `
/var/log/flowd.log*` files produced by NetFliter are rotated by `
flowd_aggregate` (https://forum.opnsense.org/index.php?topic=3594.0), which gets killed due to low memory conditions.
Looking at how the '/var RAM disk' option works, it simply triggers creation of tmpfs with minimal options (https://github.com/opnsense/core/blob/stable/21.7/src/etc/rc.subr.d/var#L135). I.e.
mount -t tmpfs tmpfs /var
For reference, enabling '/tmp RAM disk' does something similar (https://github.com/opnsense/core/blob/stable/21.7/src/etc/rc.subr.d/tmp#L45)
mount -t tmpfs -o mode=01777 tmpfs /tmp
When no `
size` parameter is specified, `
tmpfs` defaults to using all of the available memory (https://www.freebsd.org/cgi/man.cgi?query=tmpfs&sektion=&n=1).
Quote
size Specifies the total file system size in bytes, unless
suffixed with one of k, m, g, t, or p, which denote
byte, kilobyte, megabyte, gigabyte, terabyte and
petabyte respectively. If zero (the default) or a
value larger than SIZE_MAX - PAGE_SIZE is given, the
available amount of memory (including main memory and
swap space) will be used.
Understandably, having this particular combination of memory- and disk-hungry applications running on a system with constrained resources isn't a great idea. However I imagine there are legitimate cases where having a tmpfs-backed `/var` is desirable, and it would be preferable to exhaust logging space before the system runs out of memory and starts killing off processes.
When using '/var RAM disk' option in 'System: Settings: Miscellaneous', can tmpfs be capped at a size less than total system memory? Preferably user defined, either expressed as a percentage of system memory or an explicit value.
Looking at https://github.com/opnsense/core/issues/243 and the initial memory disk to tmpfs migration... this upper bound was fatal for people trying to update since pkg will flush packages into /var which can be a RAM disk... I'm not against an upper bound feature if contributed, but I'm sure it will break things for users again if they choose to use it.
Cheers,
Franco
PS: Diving into https://github.com/opnsense/core/issues/2856 it was noted that you could always use tmpfs manually in /etc/fstab and avoid the configuration setting. That certainly works for /tmp, although /var in /etc/fstab is, er, suboptimal. All in all, /var on memory is a gift that keeps on giving. Maybe you want to try /var/log in /etc/fstab and see how that goes.
Thanks for the additional information franco.
Looking at the commit where you migrated from mdmfs to tmpfs (https://github.com/opnsense/core/commit/ed2bea152bbb4712a71f905378922967d43f4dea), I can see how the previous limit for 60 MB for `
/var` might cause issues. It's certainly far too small.
I guess searching GitHub for related issues should be a requisite step before posting on the forums in future. The discussion in issue #2856 is very similar to the ground I've covered so far. I've read through the points you made in the issue and agree with your assessment of the situation
- adding another UI element requires translation which, although not overly difficult, does complicate matters
- reverting back to a small, hard size limit for `/var` and `/tmp` should be avoided at all costs
Thankfully I'm in a position where using tmpfs for `/var` isn't absolutely necessary. And if it was, I'm currently able to throw more RAM at the problem. In fact, I may do just that in the short term to see how it will handle NetFliter and ntopng logging.
When you say you're not against a contributed upper bound feature, how do you see it being implemented? Some internal conditional logic that sets tmpfs to some value smaller than total ram, but not smaller than a minimum threshhold? Or a full-blown user-selectable size specified in the UI (with some safeguards in place)?
> Or a full-blown user-selectable size specified in the UI (with some safeguards in place)?
Basically this. There aren't any useful safeguards however. We just need to keep the default as is. Updates can be 1GB or more now depending on what plugins are used.
Cheers,
Franco
Understandably the primary concern of imposing limits on a tmpfs-backed `/var` is running out of disk space when performing updates. Looking at my system, it appears though the directories `pkg` uses are already symlinked to non-tmpfs locations. Specifically cache (PKG_CACHEDIR)
root@OPNsense:~ # ls -l /var/cache/pkg
lrwxr-xr-x 1 root wheel 19 Jun 30 23:55 /var/cache/pkg -> /root/var/cache/pkg
and database (PKG_DBDIR)
root@OPNsense:~ # ls -l /var/db/pkg
lrwxr-xr-x 1 root wheel 16 Jun 30 23:55 /var/db/pkg -> /root/var/db/pkg
Are all aspects of OPNsense updates performed via `pkg`? If so updates via `pkg` should be unaffected by running out of space on tmpfs-backed `/var`. (Then running out of space on the primary disk is the concern, and you've got a whole bag of other problems.) If there's some other update mechanism employed, could the relevant cache/database locations be changed to somewhere on the primary disk?
I do expect A LOT has changed since, especially since the /var RAM was too small for older embedded devices to carry out updates as it turned out later. Kind of ironic, I know. ;)
https://github.com/opnsense/core/blob/58dfb05dcae7eaa787feffb4381dd80b10c306d8/src/etc/rc.subr.d/var#L98-L103
I would say it really doesn't matter which size is set by the user as long as we leave the default as is. If the user wants the size restriction of 1 MB he or she shall have it.
Cheers,
Franco