OPNsense Forum

English Forums => 24.7, 24.10 Legacy Series => Topic started by: Greelan on August 13, 2024, 01:49:07 PM

Title: Disk read errors
Post by: Greelan on August 13, 2024, 01:49:07 PM
Getting the following repeatedly in the log after the update to 24.7:

2024-08-13T21:45:01 Notice kernel (nda0:nvme0:0:0:1): Error 5, Retries exhausted
2024-08-13T21:45:01 Notice kernel (nda0:nvme0:0:0:1): CAM status: Unknown (0x420)
2024-08-13T21:45:01 Notice kernel (nda0:nvme0:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=11e0c7d0 0 27 0 0 0
2024-08-13T21:45:01 Notice kernel nvme0: UNRECOVERED READ ERROR (02/81) crd:0 m:0 dnr:0 p:1 sqid:2 cid:118 cdw0:0
2024-08-13T21:45:01 Notice kernel nvme0: READ sqid:2 cid:118 nsid:1 lba:299943888 len:40


Would welcome suggestions for troubleshooting.

The install is on ZFS.
Title: Re: Disk read errors
Post by: meyergru on August 13, 2024, 02:18:19 PM
That is a very specific message about a read error in LBA 299943888, so you could calculate the offset and do a direct read from /dev/nda0 at that location to see if it is really there, then with another offset to verify if it is really a hardware error. Try something like:


dd if=/dev/nda0 of=/dev/null bs=512 skip=299943888 count=40


Look at smartctl -a /dev/nvme0 to find the blocksize, IDK if it is 512 Bytes or 4096.

With ZFS, this can happen because of COW, so at some point in time you will hit any bad spot if there is one. This does not have to be caused by the 24.7 upgrade.

You can also try a "smartctl --test=long /dev/nvme0".
Title: Re: Disk read errors
Post by: Greelan on August 14, 2024, 12:36:20 PM
Thanks

dd if=/dev/nda0 of=/dev/null bs=512 skip=299943888 count=40
dd: /dev/nda0: Input/output error
32+0 records in
32+0 records out
16384 bytes transferred in 0.007795 secs (2101909 bytes/sec)


smartctl -a /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.1-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       UMIS LENSE40256GMSP34MESTB3A
Serial Number:                      SS0L25152X3RC0AF114X
Firmware Version:                   2.3.7182
PCI Vendor/Subsystem ID:            0x1cc4
IEEE OUI Identifier:                0x044a50
Total NVM Capacity:                 256,060,514,304 [256 GB]
Unallocated NVM Capacity:           0
Controller ID:                      6059
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Utilization:            0
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            504a04 c500000000
Local Time is:                      Wed Aug 14 20:27:21 2024 AEST
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0016):     Wr_Unc DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     84 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +     6.50W    6.50W       -    0  0  0  0        0       0
1 +     4.60W    4.60W       -    1  1  1  1        5       5
2 +     3.90W    3.90W       -    2  2  2  2        5       5
3 -     1.50W    1.50W       -    3  3  3  3     4000    4000
4 -   0.0050W    0.50W       -    4  4  4  4    20000   30000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    98%
Available Spare Threshold:          3%
Percentage Used:                    100%
Data Units Read:                    5,950,084 [3.04 TB]
Data Units Written:                 985,528,704 [504 TB]
Host Read Commands:                 82,219,540
Host Write Commands:                10,135,637,294
Controller Busy Time:               492,801
Power Cycles:                       53
Power On Hours:                     32,993
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    2,809
Error Information Log Entries:      3,018
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               43 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0       3018     2  0x005f  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  1       3017     3  0x0068  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  2       3016     2  0x0063  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  3       3015     4  0x0073  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  4       3014     2  0x006d  0x0281 0xe800            0     1     -  Unknown Command Specific Status 0x40
  5       3013     1  0x0061  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  6       3012     4  0x007b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  7       3011     3  0x006f  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  8       3010     3  0x006c  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  9       3009     1  0x006b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
10       3008     2  0x007b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
11       3007     2  0x0079  0x0281 0x7801            0     1     -  Unknown Command Specific Status 0x40
12       3006     2  0x007d  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
13       3005     2  0x0079  0x0281 0x7801            0     1     -  Unknown Command Specific Status 0x40
14       3004     2  0x0072  0x0281  0x7d1            0     1     -  Unknown Command Specific Status 0x40
15       3003     1  0x006b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
... (48 entries not read)

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
0   Extended          Completed: failed segments            32993        61616     1   7   -    -


Suggestive of a hardware issue?
Title: Re: Disk read errors
Post by: Patrick M. Hausen on August 14, 2024, 12:37:22 PM
Failing_LBA
61616


Er ... yes?  :)
Title: Re: Disk read errors
Post by: Greelan on August 14, 2024, 12:46:39 PM
Except I also see posts like this (https://superuser.com/questions/1823257/what-conclusion-can-be-drawn-from-smartctl-self-tests-no-longer-failing-on-ssd), which suggests that the failures might not be what they seem to be.
Title: Re: Disk read errors
Post by: doktornotor on August 14, 2024, 01:01:43 PM
Quote from: Greelan on August 14, 2024, 12:46:39 PM
Except I also see posts like this (https://superuser.com/questions/1823257/what-conclusion-can-be-drawn-from-smartctl-self-tests-no-longer-failing-on-ssd), which suggests that the failures might not be what they seem to be.

Not sure what you mean? That discussion still concerns bad blocks recovered and remapped after many and many retries. If you want to deal with randomly failing drives, sure you can keep doing so...
Title: Re: Disk read errors
Post by: Greelan on August 14, 2024, 01:15:42 PM
More a matter of hoping... Lol

Oh, well, off to get a new one.
Title: Re: Disk read errors
Post by: doktornotor on August 14, 2024, 01:22:44 PM
I always discard these drives. The thing is, if something critical, such as kernel, happens to land in the bad spot(s), you have an unbootable box. Not something you want to deal with normally.
Title: Re: Disk read errors
Post by: Greelan on August 14, 2024, 01:24:00 PM
Agreed. Fortunately a reinstall with OPNsense is pretty straightforward with a configuration backup
Title: Re: Disk read errors
Post by: meyergru on August 14, 2024, 01:27:37 PM
This SSD has had it (https://www.youtube.com/watch?v=4vuW6tQ0218&t=127s) - and it told you so. Just look at the "Percentage Used": it is at 100%. That is no wonder, because of the 504 TBytes that have been written to it in 3.6 years of usage.

UMIS is a Lenovo internal brand, those consumer-grade SSDs have been built into Laptops and are not for write-intensive loads such as what you did to it (writing 170 times as much as you read). The disk has used 2% of its available spare already, so that there have been bad blocks while writing already is 100% guaranteed. This is because of the disk capacity having been overwritten ~1000 times.

The reason for posts like you have pointed to is another one: The Non-Pro models of Samsung (and the Pro models as well) once suffered from a firmware problem: Matter-of-fact, SSDs use flash memory, which is powered by capacitively charged cells. Those cells lose charge over time, especially when they are not used (i.e. "written to").

I found that with a Samsung 980 Pro, which got very slow after ~1 year. It had game installations on it, which were written only once. Now, over time, the cells had degraded, such that ECC errors occurred, which had to be corrected and thus the read performance was severly degraded. Samsung fixed that with a firmware upgrade that once in a while "refreshes" all cells that have not been written in a while - further adding to cell wear even without anybody actually writing data.

Kingstons KC3000 seems to suffer similar issues.

Even with prosumer-grade disks and disk sizes less than 1 TByte (i.e. without much spare allocation), you can expect ~4-5 years max. lifetime unless you log on memory or reduce log levels. Also, the database for RRD poses a heavy write load on the disk.

That is to say: The bigger, the better, QLC is a no-go, better use TLC (industrial grade) than MLC.

Title: Re: Disk read errors
Post by: Greelan on August 14, 2024, 01:35:48 PM
Appreciate the insights. Interesting regarding the level of writes, since this box has only ever been used for OPN. I thought routers/firewalls weren't that disk intensive? I don't have OPN configured to do excessive firewall logging.

Any suggestions for a replacement?
Title: Re: Disk read errors
Post by: doktornotor on August 14, 2024, 01:41:59 PM
Quote from: Greelan on August 14, 2024, 01:35:48 PMI thought routers/firewalls weren't that disk intensive? I don't have OPN configured to do excessive firewall logging.

If you enable the insights (netflow), it will eat your drive for lunch if its a small one. RRD was also already mentioned, plus ZFS filesystem is CoW - so that does not treat the SSDs too gently either.
Title: Re: Disk read errors
Post by: meyergru on August 14, 2024, 03:22:45 PM
Yes, but there is a difference between RRD and Netflow: RRD is stored in /var/db/rrd, which is always placed on disk, wherease Netflow is in /var/log/flowd*, which can be kept in RAM if you enable that under System: Settings: Miscellaneous.

I always do that, because I usually do not care about /var/log after a reboot.

ZFS sure does a lot of writes, however, regardless of COW, a rewrite is still the same as a write for flash-based memory, a new (or reallocated) block is used anyway, so under the hood, flash memory acts like COW anyway from a wearout perspective.


As for a recommendation, I would probably try a Lexar NM790 or NM800 Pro with 1 TByte. They have >= 1 PByte lifetime writes. The 2 TByte models are not much more expensive. There are also version with an integrated heatsink.
Title: Re: Disk read errors
Post by: Greelan on August 24, 2024, 02:46:13 AM
Returning to say I've replaced the disk and it was a very smooth process. Huge kudos to Franco, Ad and Jos and all other contributors for making it so!
Title: Re: Disk read errors
Post by: franco on August 24, 2024, 04:02:15 AM
Happy to hear. Just curious how old this ZFS install was?


Cheers,
Franco
Title: Re: Disk read errors
Post by: Greelan on August 24, 2024, 05:07:47 AM
Almost exactly 3 years. It was a conversion from a UFS install and so the disk/system is around 4 years old or so
Title: Re: Disk read errors
Post by: franco on August 24, 2024, 05:32:26 AM
Ok, this could coincide with

community/23.7/23.7.12:o system: change ZFS transaction group defaults to avoid excessive disk wear

We did have to apply this change because ZFS was wearing out disks with its metadata writes too much even when absolutely no data was written in the sync interval. You could say that ZFS is an always-write file system. Because if you always write the actual data written will wear the drive, not the metadata itself. ;)

In your case it has probably been wearing out the disk before this was put in place. That's at least 2 years worth of increased wear.


Cheers,
Franco
Title: Re: Disk read errors
Post by: meyergru on August 24, 2024, 12:46:44 PM
Interesting. That change got past me. I had excessive wear on my DEC750 SSD which was bought in 2022 after only somewhat more than a year.

The disk is now at 56% usage, but currently increasing very slowly.
Title: Re: Disk read errors
Post by: Patrick M. Hausen on August 24, 2024, 01:18:44 PM
The interesting value for nominal/guaranteed endurance can be viewed with smartctl -a or smartctl -x. For an NVME drive it's "Percentage Used:" while for a SATA drive it's "Percentage Used Endurance Indicator".

In this particular case from one of your first posts:

Percentage Used:                    100%

So the disk is worn out according to specs and apparently in reality, too.

I monitor the wear indicators for my NAS systems in Grafana like in the attached screen shot.

Kind regards,
Patrick
Title: Re: Disk read errors
Post by: Greelan on August 24, 2024, 01:46:55 PM
Quote from: Patrick M. Hausen on August 24, 2024, 01:18:44 PM
The interesting value for nominal/guaranteed endurance can be viewed with smartctl -a or smartctl -x. For an NVME drive it's "Percentage Used:" while for a SATA drive it's "Percentage Used Endurance Indicator".

In this particular case from one of your first posts:

Percentage Used:                    100%

So the disk is worn out according to specs and apparently in reality, too.

I monitor the wear indicators for my NAS systems in Grafana like in the attached screen shot.

Kind regards,
Patrick
Yeah, we already established that, and that's why the disk has been replaced. The last post before yours wasn't from me xD
Title: Re: Disk read errors
Post by: Greelan on October 29, 2024, 12:29:45 PM
Quote from: franco on August 24, 2024, 05:32:26 AM
Ok, this could coincide with

community/23.7/23.7.12:o system: change ZFS transaction group defaults to avoid excessive disk wear

We did have to apply this change because ZFS was wearing out disks with its metadata writes too much even when absolutely no data was written in the sync interval. You could say that ZFS is an always-write file system. Because if you always write the actual data written will wear the drive, not the metadata itself. ;)

In your case it has probably been wearing out the disk before this was put in place. That's at least 2 years worth of increased wear.


Cheers,
Franco

Franco, was this change applied to existing systems, not just new installations?

Because two months into my new disk, I already have 23 TB of writes ...
Title: Re: Disk read errors
Post by: franco on November 08, 2024, 08:07:18 AM
It's in effect for all systems beginning with 23.7.12 unless the sysctl is overwritten. Note this lowers the writes but does not eliminate them. IMO this is a ZFS design flaw flushing metadata for an unchanged file system, it's probably keeping track of itself more than the actual data, but it is what it is.


Cheers,
Franco