[SOLVED] Clarity on Amount of Data Written to Disk - 200GB/Week?

Started by h3zwe, April 04, 2024, 12:20:08 AM

Previous topic - Next topic
I recently checked my SMART data after a fresh install, and noticed that 'data written' was at ~200GB/week. This was with 'Reporting: Unbound DNS', 'Reporting: NetFlow', and 'System: Settings: Logging' enabled.

After disabling Netflow and Unbound logging, and moving tmp/log to RAM, I managed to decrease 'data written' to ~2GB/day. While this is a significant decrease vs what it was previously, I still feel like that is relatively high given everything should be in RAM now.

Could someone provide some insight here? I searched online, but aside from multiple users raising the high disk writes, no further detail was available. Would be great to understand why it's still 2GB/day, even though tmp/log are in RAM. Especially given that available storage space is not decreasing by anywhere near that number.

A quick look at iotop did not show any obvious culprits.

Indeed, when I switched from pfSense to OPNsense I was surprised by the huge amount of disk writes that OPNsense makes.  My gateway was averaging ~3.5GB writes/day, which I found to be rather excessive.  I did a bit of digging and with a few small changes I reduced the daily writes significantly (it's averaging 50MB/day now with no loss in functionality or stability).

There are a number of contributors to writes, one of the largest of which is the RRD data for the Reporting + Health dashboard in the OPNsense control panel.  This is actually straightforward to address - you can simply add an entry in your fstab for "/var/db/rrd" as a tmpfs volume (I use a 64MB volume size for this, also a reboot will be necessary to enable this).  Then go  to System + Settings + Miscellaneous in the OPNsense control panel, then in the "Periodic Backups" section, and change the "Periodic RRD Backup" to "Power off" for maximum write savings (or pick a backup time period you would like).

Given that OPNsense supports this backup functionality, it'd be nice if they just supported this as the default when logging to RAM is enabled, but I haven't bothered to file a suggestion for this.

There are other more minor changes one can do, such as disabling continuous OpenBSD entropy generation (one line config change), and stopping OPNsense from continually rewriting /etc/hosts and /etc/resolv.conf when they haven't changed, moving the DHCP lease database to tmpfs and enabling periodic backups for that as well, and so on.

If it is helpful, you can find some good background regarding this subject here:  https://github.com/opnsense/core/issues/6596

Quote from: 5kft on April 07, 2024, 04:57:59 PM
Indeed, when I switched from pfSense to OPNsense I was surprised by the huge amount of disk writes that OPNsense makes.  My gateway was averaging ~3.5GB writes/day, which I found to be rather excessive.  I did a bit of digging and with a few small changes I reduced the daily writes significantly (it's averaging 50MB/day now with no loss in functionality or stability). (...)

And I thought I was doing well with my 2GB/day! I'll have to look into this more to see what else I can reasonably disable. I'd prefer to stick to OOTB settings vs 'hacks', but might have to go down that path by the looks of it...

Quote from: 5kft on April 07, 2024, 04:57:59 PM
(...) There are a number of contributors to writes, one of the largest of which is the RRD data for the Reporting + Health dashboard in the OPNsense control panel.  This is actually straightforward to address - you can simply add an entry in your fstab for "/var/db/rrd" as a tmpfs volume (I use a 64MB volume size for this, also a reboot will be necessary to enable this).  Then go  to System + Settings + Miscellaneous in the OPNsense control panel, then in the "Periodic Backups" section, and change the "Periodic RRD Backup" to "Power off" for maximum write savings (or pick a backup time period you would like). (...)

I actually have 'Periodic RRD Backup' set to 'Disabled'. I believe the system did this automatically when I turned off 'Round-Robin-Database' in 'Reporting: Settings'.

I just noticed that I might have discovered a bug related to this setting too, as my 'Health' dashboard is showing a blank page. Seems to be related to https://github.com/opnsense/core/issues/3141.

Console shows

systemhealth:1462 Uncaught TypeError: Cannot read properties of undefined (reading '0')
    at systemhealth:1462:67
    at Object.complete (opnsense.js?v=4567372b83d8bd1e:298:21)
    at c (jquery-3.5.1.min.js?v=4567372b83d8bd1e:2:28294)
    at Object.fireWith (jquery-3.5.1.min.js?v=4567372b83d8bd1e:2:29039)
    at l (jquery-3.5.1.min.js?v=4567372b83d8bd1e:2:79928)
    at XMLHttpRequest.<anonymous> (jquery-3.5.1.min.js?v=4567372b83d8bd1e:2:82254)

If you have the RAM, you can also enable a ramdisk for a lot of this stuff, but I forget the exact steps.

First of all for your question to make sense you need to say if you mean on UFS or ZFS...


Cheers,
Franco

Quote from: franco on April 15, 2024, 09:32:06 AM
First of all for your question to make sense you need to say if you mean on UFS or ZFS...


Cheers,
Franco

ZFS :)

Ok, so ZFS likes to flush metadata to the disk even if no single file has been touched. The metadata is huge, about 20GB per day in some cases, which is 140GB per week not too far from your reported 200GB.

We switched the standard sync to be 5 minutes instead of 30 seconds late in 23.7.x, but ZFS can still exhibit this excess write behaviour due to other reasons mentioned here.

If you are not on the latest 23.7.x or 24.1 give that a try, otherwise let us know the version to dig a bit deeper. RRD and Netflow are constant offenders for sure, but log files can be as well (you can set /var/log MFS if that is a problem trading disk life for volatile log content).


Cheers,
Franco

Thanks, appreciate the additional context.

With RRD/Netflow disabled and RAM disk for tmp/logs, I'm at about 2GB/day, which given my NVMe's 500TBW rating I am happy with.

Quote from: franco on April 15, 2024, 09:47:48 AM
Ok, so ZFS likes to flush metadata to the disk even if no single file has been touched. The metadata is huge, about 20GB per day in some cases, which is 140GB per week not too far from your reported 200GB.

We switched the standard sync to be 5 minutes instead of 30 seconds late in 23.7.x, but ZFS can still exhibit this excess write behaviour due to other reasons mentioned here.


Which ZFS parameter was this? I would like to fix this for Proxmox, too, but cannot find any probable parameter that equates to 300 seconds.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+