Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Greelan

#1
I noticed the same thing yesterday as well. Not sure when it started. Don't have time to troubleshoot atm but will have to investigate. Not sure whether it a change at Mullvad's end or with OPNsense.
#2
Quote from: franco on March 03, 2025, 08:12:54 AMmimgmail repo? ;)

OPNsense repo actually (os-postfix plugin). But running the update again after installing 25.1.2 bumped the Postfix version and fixed the issue (as has been previously posted).
#3
24.7, 24.10 Legacy Series / Re: Disk read errors
October 29, 2024, 12:29:45 PM
Quote from: franco on August 24, 2024, 05:32:26 AM
Ok, this could coincide with

community/23.7/23.7.12:o system: change ZFS transaction group defaults to avoid excessive disk wear

We did have to apply this change because ZFS was wearing out disks with its metadata writes too much even when absolutely no data was written in the sync interval. You could say that ZFS is an always-write file system. Because if you always write the actual data written will wear the drive, not the metadata itself. ;)

In your case it has probably been wearing out the disk before this was put in place. That's at least 2 years worth of increased wear.


Cheers,
Franco

Franco, was this change applied to existing systems, not just new installations?

Because two months into my new disk, I already have 23 TB of writes ...
#4
24.7, 24.10 Legacy Series / Re: Disk read errors
August 24, 2024, 01:46:55 PM
Quote from: Patrick M. Hausen on August 24, 2024, 01:18:44 PM
The interesting value for nominal/guaranteed endurance can be viewed with smartctl -a or smartctl -x. For an NVME drive it's "Percentage Used:" while for a SATA drive it's "Percentage Used Endurance Indicator".

In this particular case from one of your first posts:

Percentage Used:                    100%

So the disk is worn out according to specs and apparently in reality, too.

I monitor the wear indicators for my NAS systems in Grafana like in the attached screen shot.

Kind regards,
Patrick
Yeah, we already established that, and that's why the disk has been replaced. The last post before yours wasn't from me xD
#5
24.7, 24.10 Legacy Series / Re: Disk read errors
August 24, 2024, 05:07:47 AM
Almost exactly 3 years. It was a conversion from a UFS install and so the disk/system is around 4 years old or so
#6
24.7, 24.10 Legacy Series / Re: Disk read errors
August 24, 2024, 02:46:13 AM
Returning to say I've replaced the disk and it was a very smooth process. Huge kudos to Franco, Ad and Jos and all other contributors for making it so!
#7
24.7, 24.10 Legacy Series / Re: Disk read errors
August 14, 2024, 01:35:48 PM
Appreciate the insights. Interesting regarding the level of writes, since this box has only ever been used for OPN. I thought routers/firewalls weren't that disk intensive? I don't have OPN configured to do excessive firewall logging.

Any suggestions for a replacement?
#8
24.7, 24.10 Legacy Series / Re: Disk read errors
August 14, 2024, 01:24:00 PM
Agreed. Fortunately a reinstall with OPNsense is pretty straightforward with a configuration backup
#9
24.7, 24.10 Legacy Series / Re: Disk read errors
August 14, 2024, 01:15:42 PM
More a matter of hoping... Lol

Oh, well, off to get a new one.
#10
24.7, 24.10 Legacy Series / Re: Disk read errors
August 14, 2024, 12:46:39 PM
Except I also see posts like this, which suggests that the failures might not be what they seem to be.
#11
24.7, 24.10 Legacy Series / Re: Disk read errors
August 14, 2024, 12:36:20 PM
Thanks

dd if=/dev/nda0 of=/dev/null bs=512 skip=299943888 count=40
dd: /dev/nda0: Input/output error
32+0 records in
32+0 records out
16384 bytes transferred in 0.007795 secs (2101909 bytes/sec)


smartctl -a /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.1-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       UMIS LENSE40256GMSP34MESTB3A
Serial Number:                      SS0L25152X3RC0AF114X
Firmware Version:                   2.3.7182
PCI Vendor/Subsystem ID:            0x1cc4
IEEE OUI Identifier:                0x044a50
Total NVM Capacity:                 256,060,514,304 [256 GB]
Unallocated NVM Capacity:           0
Controller ID:                      6059
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Utilization:            0
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            504a04 c500000000
Local Time is:                      Wed Aug 14 20:27:21 2024 AEST
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0016):     Wr_Unc DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     84 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +     6.50W    6.50W       -    0  0  0  0        0       0
1 +     4.60W    4.60W       -    1  1  1  1        5       5
2 +     3.90W    3.90W       -    2  2  2  2        5       5
3 -     1.50W    1.50W       -    3  3  3  3     4000    4000
4 -   0.0050W    0.50W       -    4  4  4  4    20000   30000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    98%
Available Spare Threshold:          3%
Percentage Used:                    100%
Data Units Read:                    5,950,084 [3.04 TB]
Data Units Written:                 985,528,704 [504 TB]
Host Read Commands:                 82,219,540
Host Write Commands:                10,135,637,294
Controller Busy Time:               492,801
Power Cycles:                       53
Power On Hours:                     32,993
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    2,809
Error Information Log Entries:      3,018
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               43 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0       3018     2  0x005f  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  1       3017     3  0x0068  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  2       3016     2  0x0063  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  3       3015     4  0x0073  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  4       3014     2  0x006d  0x0281 0xe800            0     1     -  Unknown Command Specific Status 0x40
  5       3013     1  0x0061  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  6       3012     4  0x007b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  7       3011     3  0x006f  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  8       3010     3  0x006c  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
  9       3009     1  0x006b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
10       3008     2  0x007b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
11       3007     2  0x0079  0x0281 0x7801            0     1     -  Unknown Command Specific Status 0x40
12       3006     2  0x007d  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
13       3005     2  0x0079  0x0281 0x7801            0     1     -  Unknown Command Specific Status 0x40
14       3004     2  0x0072  0x0281  0x7d1            0     1     -  Unknown Command Specific Status 0x40
15       3003     1  0x006b  0x0281  0x000            0     1     -  Unknown Command Specific Status 0x40
... (48 entries not read)

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
0   Extended          Completed: failed segments            32993        61616     1   7   -    -


Suggestive of a hardware issue?
#12
24.7, 24.10 Legacy Series / Disk read errors
August 13, 2024, 01:49:07 PM
Getting the following repeatedly in the log after the update to 24.7:

2024-08-13T21:45:01 Notice kernel (nda0:nvme0:0:0:1): Error 5, Retries exhausted
2024-08-13T21:45:01 Notice kernel (nda0:nvme0:0:0:1): CAM status: Unknown (0x420)
2024-08-13T21:45:01 Notice kernel (nda0:nvme0:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=11e0c7d0 0 27 0 0 0
2024-08-13T21:45:01 Notice kernel nvme0: UNRECOVERED READ ERROR (02/81) crd:0 m:0 dnr:0 p:1 sqid:2 cid:118 cdw0:0
2024-08-13T21:45:01 Notice kernel nvme0: READ sqid:2 cid:118 nsid:1 lba:299943888 len:40


Would welcome suggestions for troubleshooting.

The install is on ZFS.
#14
Quote from: Demusman on April 16, 2023, 01:18:31 AM
By all means, go ahead and point out what is inaccurate in either of the first two posts.
Like in the past, you have missed my point - maybe it's deliberate?
#15
No, BondiBlueBalls is "100 accurate". By all means make suggestions for improvement or highlight problems (preferably with ideas for solutions) - just don't be a dick about it.