OPNSense installation on SSD and use ZFS considering wearing out SSD

Started by PencilHCV, March 29, 2024, 08:00:03 AM

Previous topic - Next topic
Many if not all recommend using ZFS when installing OPNSense.
But what about wearing out the SSD, when doing the installation on a consumer SSD (When using only 1 drive). Or should you use ext4 in order not to wear out the SSD?

Best regards!
HCV

There is no EXT4 in FreeBSD. Why would ZFS wear out an SSD faster than UFS? What is the TBW of your SSD?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

sorry, UFS no Ext4.
Why am I wondering, because at the beginning of this week I checked the SSD with the SMART plugin in my OPNSense and showed:
  231 SSD_Life_Left value 19 and today shows 18.

Here is the entire SMART log:
smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B7685B6D4DF
LU WWN Device Id: 5 0026b7 685b6d4df
Firmware Version: S3H01103
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Mar 29 09:14:04 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x03)   Offline data collection activity
               is in progress.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (  41)   The self-test routine was interrupted
               by the host with a hard or soft reset.
Total time to complete Offline
data collection:       (  120) seconds.
Offline data collection
capabilities:           (0x11) SMART execute Offline immediate.
               No Auto Offline data collection support.
               Suspend Offline collection upon new
               command.
               No Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               No Selective Self-test supported.
SMART capabilities:            (0x0002)   Does not save SMART data before
               entering power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       7312
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       22
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       1
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       1
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       1
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       0
170 Bad_Blk_Ct_Lat/Erl      0x0000   100   100   010    Old_age   Offline      -       0/0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       0
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       9
194 Temperature_Celsius     0x0022   026   045   000    Old_age   Always       -       26 (Min/Max 15/45)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       1
231 SSD_Life_Left           0x0000   018   018   000    Old_age   Offline      -       18
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       130122
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       15040
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       51
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       575
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       619
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       274920

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 1

ATA Error Count: 0
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error -4 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 00 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d0 01 00 4f c2 40 08      00:00:00.000  SMART READ DATA
  b0 d1 01 01 4f c2 40 08      00:00:00.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 da 00 00 4f c2 40 08      00:00:00.000  SMART RETURN STATUS
  b0 d5 01 00 4f c2 40 08      00:00:00.000  SMART READ LOG
  b0 d5 01 01 4f c2 40 08      00:00:00.000  SMART READ LOG

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      90%      7312         -

Selective Self-tests/Logging not supported

The above only provides legacy SMART information - try 'smartctl -x' for more

That device has a TBW value of only 80, which is not much. You have written about 15 TB so far. I'd monitor the Lifetime_Writes_GiB SMART attribute over a couple of days. That gives you a good estimate of the daily write load, so you can estimate when 80 TB will be reached. Even then SSDs typically don't stop working instantly, but it's reached the specified life expectancy then.

This is not related to ZFS or UFS. Running systems write logs ...  ;)
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

How big are system logs though? They shouldn't amount to a lot of data per month written.

Check the Lifetime_Writes_GiB value on two consecutive days and you know.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Thank you Patrick and Greg for your time in trying to help me.

Greg, where can I see how big the system log is?

Patrick, I will do what you recommend checking the Lifetime:Writes_GIB value.

From my observation, it is not the system logs, but the netflow aggregation that does the largest amount of disk writes, unless you have very verbose settings enabled.

You can either disable netflow for all of your interfaces or, if you have enough memory, put all of the logs on a RAM disk. This is done on the "System: Settings: Miscellaneous" page. Keep in mind that upon reboot, the logs will be gone.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Isn't netflow disabled by default?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I am unsure of this, however when you play around with the reporting section and enable it accidentally, this becomes a problem. I had this when I bought my DEC750 and within a year, the wearout of my 256 GByte SSD was at 25%.

In the OP's case, there were 15 TBytes written in 300 days, which is 50 GBytes worth of data written each day. Unless he has a really busy connection or very verbose log settings, I deem this a bit much.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

NetFlow is Disabled
Listening Interfaces  =   Nothing selected
Wan interfaces =           Nothing selected

In Reporting Database Options
Round-Robin-Database is "Enabled"   =Enables the RDD graphing backend