Hi community,
I own a DEC750 with nvme running 25.1.10 Recently I got a failed smart message:
smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.2-RELEASE-p3 amd64] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION === Model Number: TS256GMTE652T2 Serial Number: H433990185 Firmware Version: 52B9T7OA PCI Vendor/Subsystem ID: 0x1d79 IEEE OUI Identifier: 0x000000 Controller ID: 1 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 256,060,514,304 [256 GB] Namespace 1 Utilization: 255,796,785,152 [255 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Jul 15 09:47:04 2025 CEST Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Maximum Data Transfer Size: 32 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 90 Celsius
Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 9.00W - - 0 0 0 0 0 0
Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0
=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED!
NVM subsystem reliability has been degraded
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x04 Temperature: 43 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 159% Data Units Read: 15,175,817 [7.77 TB] Data Units Written: 868,173,472 [444 TB] Host Read Commands: 166,826,964 Host Write Commands: 6,380,384,852 Controller Busy Time: 74,813 Power Cycles: 22 Power On Hours: 22,786 Unsafe Shutdowns: 16 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 234 Critical Comp. Temperature Time: 0 Thermal Temp. 1 Transition Count: 13638 Thermal Temp. 1 Total Time: 111289
Error Information (NVMe Log 0x01, 16 of 256 entries) No Errors Logged
Self-test Log (NVMe Log 0x06, NSID 0xffffffff) Self-test status: No self-test in progress Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code 0 Extended Completed: failed segments 22597 - - 2 - - 1 Extended Completed: failed segments 22556 - - 2 - - 2 Short Completed: failed segments 22554 - - 2 - - 3 Short Completed: failed segments 22549 - - 2 - - 4 Short Completed: failed segments 17155 - - 2 - - 5 Short Completed: failed segments 12464 - - 2 - -
I haven't open the box yet, so my questions are:
can the nvme be changed?
if yes what type should I buy
is there an install from scratch procedure?
Thanks you
You should contact Deciso about the hardware related questions.
Yes, there is an install from scratch procedure once you changed the drive. Export your current configuration and keep it somewhere safe as long as you can still do that.
System > Configuration > Backups > Download
Yes, it can be changed. You can use any standard M.2 NVME disk of appropriate size.
Alas, I found that with early OpnSense versions, logging and disk flush interval was suboptimal, leading to early decay of the build-in disks.
amazing, I will do this.
weird it degraded in less than 2 years
I will probably disable any cache in the future.
Thank you
Considering you will most probably use ZFS, you should opt for one of the better specimens that have TLC or MLC flash, not QLC. They should have a high TBW value and preferably, RAM cache (not SLC cache). It does not have to be the quickest type, though, PCIe 3.0 will do.
I would go for Transcend MTE220S, WD SN700, KIOXIA EXCERIA G2 or better, you can also use a larger capacity in order to reach higher overall TBW.
Then, disable anything that excessively does writes on the disk (like Netflow and detailed logging). If you install from scratch, the write interval should already be optimized, see: https://github.com/opnsense/core/commit/d766ae211c . You can check what your system uses via: "sysctl -a | fgrep vfs.zfs.txg.timeout"
Personally I've found a budget NVMe SSD with a HMB cache to work just fine for OPNsense. I bought a Team MP33 512GB (solely for higher endurance rating) and it's been working fine for 3 years so far with its endurance used according to SMART data at only 16%. Granted I set it up with UFS and not ZFS, but I imagine it won't be much different than a ZFS setup.
Just remember if you try to "right size" the SSD to something like a 128 GB (or even smaller) you're going to run into laughably small endurance ratings, since they're typically rated based on drive writes per day. 512 GB drives aren't very expensive while also offering good endurance ratings, such as the 600 TBW for the one I bought.
Quote from: Stormscape on July 16, 2025, 08:37:07 AM...
not ZFS, but I imagine it won't be much different than a ZFS setup.
It
will. ZFS is a copy-on-write file system. You are right about the size. I already pointed to that and larger capacities are not that much more expensive, as well.
But as stated, flowd and excessive logging will impact write volume on either UFS or ZFS, massively.
Are there instructions as to how to replace them and upload the new OPNsense? possibly a YT vid?
Open box, locate M.2 drive, remove single screw holding it, replace M.2 drive, refit screw, close box.
As for installation: https://docs.opnsense.org/manual/install.html
Important step, do this first:
Go to System --> Configuration --> Backups in the GUI and download your config
If you install the same version of the OS, you should be able to just do a fresh install, log in, and upload the config. A reboot is probably nice but I don't think it is required.
I did this when I went from the open version to the business version and it was fine.
In theory this may work if you get an older or newer version installed, but I like to try and keep them as close to the same as possible.