I previously had a ZFS install of OPNsense (mirrored, 1x 2.5 SATA and 1x NVMe m.2). I had the SMART plug installed and enabled to run short self-tests twice a week. One day the widget reported the NVMe drive failed, and I received failure notifications every second spamming the log.
The following is the health information:
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x04
Temperature: 35 Celsius
Available Spare: 100%
Available Spare Threshold: 1%
Percentage Used: 104%
Data Units Read: 2,314,455 [1.18 TB]
Data Units Written: 52,182,266 [26.7 TB]
Host Read Commands: 36,689,077
Host Write Commands: 486,151,530
Controller Busy Time: 7,726
Power Cycles: 18
Power On Hours: 1,310
Unsafe Shutdowns: 15
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 35 Celsius
Temperature Sensor 2: 35 Celsius
When I researched the issue, I came across the following.
https://forum.proxmox.com/threads/how-to-get-rid-of-smart-reliability-notifications.130103/ (https://forum.proxmox.com/threads/how-to-get-rid-of-smart-reliability-notifications.130103/)
QuoteDear Client,
'Critical Warning: 0x04' is caused by "Percentage Used" being above 100%. In its own right, this only indicates that the drive is now out of warranty by the manufacturer. However, as long as 'Available Spare' is greater than 'Available Spare Threshold', you can safely ignore this.
Unfortunately, tools like smartctl will report the disk as failed, so you might need some custom filters for your monitoring.
This topic has been investigated and analyzed with our vendors for a very long time. Unfortunately, it is not possible to disable this warning for our use case. If you insist on it nonetheless, we can offer to replace the SSD for you as a gesture of goodwill.
Thank you very much for your understanding.
I have since replaced the NVMe with a new one and did a fresh installation of OPNsense on UFS. I would like to reuse my old NVMe again with OPNsense later, because there's nothing wrong with it.
My question:
I would like to keep using SMART, but I would like to a) avoid receiving SMART notification every second filling up the log and b) possibly not having SMART report the drive as failed to begin with.
Has anyone experienced a comparable situation and how to address it?
Thank you
You have it on a decent fast UPS? ZFS is usually the choice filesystem for this fw application.
Maybe start here https://man.freebsd.org/cgi/man.cgi?smartd.conf%285%29
Quote from: z0rk on August 20, 2025, 03:54:14 AM[...]
Has anyone experienced a comparable situation and how to address it?
No, my SSDs all have low P/E cycles - I didn't find any with spare usage. A few have >90000 Power On Hours (10 years). Even so, I'm paranoid about endurance, so I would consider any SSD with high spare usage unreliable for any application other than temporary storage.
1310 POH with 26TBW is a heck of a write rate.
Quote from: BrandyWine on August 20, 2025, 06:41:23 AMYou have it on a decent fast UPS? ZFS is usually the choice filesystem for this fw application.
Maybe start here https://man.freebsd.org/cgi/man.cgi?smartd.conf%285%29
This was the first ZFS installation I ever did. I never had any performance issues in the past.
Quote from: pfry on August 20, 2025, 05:45:17 PMQuote from: z0rk on August 20, 2025, 03:54:14 AM[...]
Has anyone experienced a comparable situation and how to address it?
No, my SSDs all have low P/E cycles - I didn't find any with spare usage. A few have >90000 Power On Hours (10 years). Even so, I'm paranoid about endurance, so I would consider any SSD with high spare usage unreliable for any application other than temporary storage.
1310 POH with 26TBW is a heck of a write rate.
I had Zenarmor installed. RAM usage was consistently around 14GB. I will need to read up on it some more before I reinstall it to avoid excessive writes and memory usage.
Quote from: z0rk on August 20, 2025, 07:57:58 PMThis was the first ZFS installation I ever did. I never had any performance issues in the past.
But in post #1 you said you installed on UFS.
Quote from: BrandyWine on August 20, 2025, 09:07:38 PMBut in post #1 you said you installed on UFS.
My first ZFS install, then the NVMe failed, then I did a fresh UFS install on a new NVMe. I've been using OPNsense for a few years now and always used UFS without any performance issues, but about a month ago I wanted to try out ZFS and added some more RAM and did a fresh install. Sorry for the confusion.
ZFS rules. The snapshot/rollback function alone. Accessible from the UI.
You don't want a different FS in 2025. Apart from special use cases like VMs when the underlying hypervisor already uses ZFS.
Quote from: Patrick M. Hausen on August 20, 2025, 09:25:08 PMZFS rules. The snapshot/rollback function alone. Accessible from the UI.
Damn, why didn't I know this. System > Snapshots
Alright, I will switch back.
Any thoughts on my original question though? I don't mind SMART giving me a warning whenever the short test runs, but why is my log being spammed like it did; or maybe it's being triggered by something else?
I would appreciate any insight you may have.
Thanks
No idea about the log spamming, but I monitor all of my SSDs with Scrutiny and I plan in advance to replace every drive that exceeds its guaranteed TBW rate.
https://forum.opnsense.org/index.php?topic=48101.msg242617#msg242617