Ok I was looking around the web and came across an article for pfsense that someone wrote that basically says that using SSD in a firewall setup might not be a good idea. His basis is that the firewall (opnsense/pfsense) are writing data just about every second (depending on usage and size of network etc.) and that SSD drives are not a good choice for high amounts of writes because they will fail around 10,000. Any thoughts or opinions? I wont post the link here since its actually on a pfsense forum (unless I am aloud to do this) but the article was interesting and it actually did make some sense. It was writing 10 years ago but the basis still has its roots today.
Greg
The relevant number for the write endurance of an SSD is the "TBW" or "Terabytes Written". For a typical SSD as one might use in an embedded device, like the Transcend mSATA SSD 370S in e.g. 128 Gbyte size, this number is 360.
Datasheet here:
https://www.transcend-info.com/Products/No-632
So while fundamentally valid the number of 10.000 you got from the pfSense forum is just several orders of magnitude too low.
You can get the amount of writes an SSD has done withsmartctl -a /dev/ada0
if ada0 is your device name.
The numbers to look for are in the case of one of my devices that has been operational for more than a year:
Remaining_Lifetime_Perc: 100
TLC_Writes_32MiB: 50568
Which means the SSD has done about 1.6 TiB of writes in unites of 32 MiB "cells", and the expected remaining lifetime is at 100%, which means it has not yet reallocated any cells from the reserved area to replace failed ones. And it essentially has no clue about the remaining lifetime, because that depends on future writes. The number will start to go down, once the device shows some wear.
HTH,
Patrick
Thanks Patrick, very useful indeed, but I didn't understand how to interpret this:
Remaining_Lifetime_Perc: 97
TLC_Writes_32MiB: 432542
How do you calculate the TiB of writes and the expected remaining lifetime?
Tia.
32 MiB x 432542 = 13.8 TiB.
Expected remaining lifetime: can only be calculated by monitoring Remaining_Lifetime_Perc over time. Currently your SSD has used 3% of the reserve cells to replace failed ones. Monitor how long it takes to go from 3% to 4% to 5%, then estimate when you will reach 90% ...
Oh I see, thanks: I have the exact same drive - Transcend 370S 128GB - and if I understood correctly, this drive will 'die' when it reaches 360 Terabytes of data written ?
It's been online 24/7 since May 2020 and I am now at 13.8 TiB - so, I understand there is no such precise formula to calculate for how long it will last...
Thanks.
It's guaranteed to last at least 360 TB. How fast it fails afterwards and in which way precisely again depends ...
German magazine c't had done a "let's write some SSDs to death" test and found that most last way longer than the guaranteed TBW.
Plus most of the time that warranty is combined with a time period, so e.g. for a particular Samsung drive it's 600 TBW or 5 years, whichever comes first.
Quote from: pmhausen on October 25, 2021, 10:45:59 AM
32 MiB x 432542 = 13.8 TiB.
Expected remaining lifetime: can only be calculated by monitoring Remaining_Lifetime_Perc over time. Currently your SSD has used 3% of the reserve cells to replace failed ones. Monitor how long it takes to go from 3% to 4% to 5%, then estimate when you will reach 90% ...
I could be wrong, but I don't think this percent flags failed cells. It just calculates the number of erase cycles remaining before EOL. This is explained on crucial's website, but maybe other vendors calculate this differently (Attribute 202...)
I've been monitoring mine and after 2.5 years mine is showing 73% remaining. I just switched to tempfs for both tmp and var to alleviate some of the wear. Hopefully that helps, but either way it's not a worry as I'll probably have different hardware in 10 years. BTW, the smart data for power on hours is way off for some reason (below).
https://www.crucial.com/support/articles-faq-ssd/ssds-and-smart-data
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 12278
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 13
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 073 073 000 Old_age Always - 419
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 4
180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 26
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 051 030 000 Old_age Always - 49 (Min/Max 0/70)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_ECC_Cnt 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Remain 0x0030 073 073 001 Old_age Offline - 27
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 60992914549
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 1027826061
248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 674187293
TL;DR You probably won't manage to kill a modern SSD by writing to it in a more or less normal scenario. Certainly not by logging.
I agree with the consensus that the SSD lifespan is not a concern for most firewall use cases. Here are the stats on my cheapo 120GB SATA SSD that has been running OPNsense non-stop for 2.3 years.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 000 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 20426
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 161
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
167 Write_Protect_Mode 0x0000 100 100 000 Old_age Offline - 0
168 SATA_Phy_Error_Count 0x0012 100 100 000 Old_age Always - 0
169 Bad_Block_Rate 0x0000 100 100 000 Old_age Offline - 5
170 Bad_Blk_Ct_Erl/Lat 0x0000 100 100 010 Old_age Offline - 0/13
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 MaxAvgErase_Ct 0x0000 100 100 000 Old_age Offline - 149 (Average 118)
181 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count 0x0000 100 100 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Unsafe_Shutdown_Count 0x0012 100 100 000 Old_age Always - 59
194 Temperature_Celsius 0x0022 073 069 000 Old_age Always - 27 (Min/Max 22/31)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 SATA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
218 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 1
231 SSD_Life_Left 0x0000 012 012 000 Old_age Offline - 88
233 Flash_Writes_GiB 0x0032 100 100 000 Old_age Always - 7601
241 Lifetime_Writes_GiB 0x0032 100 100 000 Old_age Always - 12530
242 Lifetime_Reads_GiB 0x0032 100 100 000 Old_age Always - 122
244 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 118
245 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 149
246 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 1301864
At 88% life remaining, I'm using roughly 5% of the SSD life every year. At this rate I'd have another 16 years remaining. And this is on a very cheap Kingston 120GB SATA SSD. A higher capacity and higher end SSD would be able to balance writes more effectively and would likely have an even greater lifespan for this use case. Plus, the SSD is faster, silent, and uses less power than a traditional spinning disk.
QuoteAt 88% life remaining, I'm using roughly 5% of the SSD life every year. At this rate I'd have another 16 years remaining.
Actually you have 12% remaining lol.
Quote from: gpb on November 14, 2021, 06:49:29 PM
QuoteAt 88% life remaining, I'm using roughly 5% of the SSD life every year. At this rate I'd have another 16 years remaining.
Actually you have 12% remaining lol.
:o Are you sure about that? I've watch it slowly tick down from the high 90s to where it's currently at now, in the high 80s after 2+ years.
Quote from: opnfwb on November 15, 2021, 01:58:20 AM
Quote from: gpb on November 14, 2021, 06:49:29 PM
QuoteAt 88% life remaining, I'm using roughly 5% of the SSD life every year. At this rate I'd have another 16 years remaining.
Actually you have 12% remaining lol.
:o Are you sure about that? I've watch it slowly tick down from the high 90s to where it's currently at now, in the high 80s after 2+ years.
Well it would be reversed from how mine reads. It seems different brands have different formats and no I'm not sure. If you've been watching it tick down you're fine. Cheers! ;)
I think the issue here is probably smartctl not reporting the value title in the same way as the manufacturer. I was curious enough about this that I quickly pulled the drive and ran the manufacturer's diag tool on it. In my case, this is a Kingston SSD.
Smartctl reports the value as 'SSD_Life_left' whereas Kingston actually lists it as "SSD Wear Indicator" and shows the wear at 12% with a remaining estimated life of 88%.
The swapped ID titles in smartctl don't make this any easier to decipher however, it looks like the drive has a long life ahead of it (fingers crossed ;) ).
Excellent! Sorry for the scare. :)