NCQ_TRIM_BROKEN boot error with 22.1 - FreeBSD 13 kernel issue

Started by BuckRogers25, February 24, 2022, 12:48:50 PM

Previous topic - Next topic
Hi all,
Sorry to make my first post on here a "HEEEELP!" post.  Love OPNSense, haven't had hardly any issues with any upgrades in the post, but have an annoying one here I'm not able to fix.  I know the cause, but everything I've tried hasn't worked.  Maybe someone a bit more familiar with FreeBSD kernel issues can steer me right, I'm at the bottom of the barrel now :(.

So after upgrade from 21.7.8 to 22.1, during boot I'm getting:
"quirks-0x2(NCQ_TRIM_BROKEN)"

Having googled around, easy spot as to what the issue is.  It's a wildcard pattern match error in the FreeBSD kernel.  Whilst the threads below say its been rolled in as a fix as of FreeBSD v11.1, obviously 22.1 OPNSense is using the FreeBSD 13 kernel, and it still seems to be an issue.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210686
https://svnweb.freebsd.org/base?view=revision&revision=304443

And info on the drive I'm using, confirming that its the drive model that's the issue:

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     Micron_M500_MTFDDAK120MAV
Serial Number:    1346095A39C0
LU WWN Device Id: 5 00a075 1095a39c0
Firmware Version: MU03
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Feb 24 09:39:36 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

There is a work around in that thread of adding these to boot/loader.conf:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210686#c9
kern.cam.ada.0.quirks="0x0" (no 4k block support)
(or)
kern.cam.ada.0.quirks="0x1" (with 4k block support)

I've done this by hand modifying loader.conf and by adding a tunable through the GUI (System->Settings->Tunables).  Sadly neither works with either 0x0 or 0x1.  0x2 is trim, and 0x3 is trim with 4k, but seems no need to test those here if nothing works.  Soon as its hits that line during boot, it just stalls.  Occasionally it will move on a line or two and then just halt.

For now I'm loading with the 21.7.8 kernel as kernel.old instead and as soon as I do, all fine, but does mean I'm stuck in limbo until I either fix this, or swap out the SSD and rebuild on another.  I may have to at this point but I'd rather not.

There are two long term fixes for this shown in that thread.
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=171939
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=173807

I'm no FreeBSD expert and chances are someone here can do in minutes what would take me hours :).  Any help or pointers would be much appreciated.  Thanks.

Hi,

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210686 has been closed in 2019 and I don't think we are missing any fix at least from that perspective.

Do you need to disable trim on the SSD to "fix" this for now?


Cheers,
Franco

Hi franco,

Cheers for prompt reply.

That's what I assumed as well when I saw it had been closed, but it doesn't look like its been verified.  I didn't actually check the current version source though...

Hmmm ok, maybe it never made it in.  Here's the current stable version.
https://svnweb.freebsd.org/base/stable/12/sys/cam/ata/ata_da.c?view=markup

Just compared to https://bz-attachments.freebsd.org/attachment.cgi?id=171939 in notepad++ quickly, and looks like that change never made it in.

Quote471            {
472                    /*
473                     * Crucial M550 SSDs
474                     * NCQ Trim doesn't work, but only on MU01 firmware
475                     */
476                    { T_DIRECT, SIP_MEDIA_FIXED, "*", "Crucial CT*M550*", "MU01" },
477                    /*quirks*/ADA_Q_NCQ_TRIM_BROKEN
478            },

vs

Quote* Crucial M550 SSDs
       * NCQ Trim doesn't work, but only on MU01 firmware
       */
-      { T_DIRECT, SIP_MEDIA_FIXED, "*", "Crucial CT*M550*", "MU01" },
-      /*quirks*/ADA_Q_NCQ_TRIM_BROKEN
+      { T_DIRECT, SIP_MEDIA_FIXED, "*", "Crucial CT???M550*",
+      "MU01" }, /*quirks*/ADA_Q_NCQ_TRIM_BROKEN
   },

Sorry lazy copy and paste there.  I wont copy and paste them all here, they are all the same basically.

I also looked through https://bz-attachments.freebsd.org/attachment.cgi?id=173807 vs https://svnweb.freebsd.org/base/stable/12/sys/cam/cam.c?view=markup , but that's a lot harder for me to decipher.  Most of it looks right in terms of the regex matching for ??? vs *, but it may be that without the changes to ata_da.c in there to, it still wouldn't match right?  Sorry pure script kiddy here, not a coder, but think I have that right.

I could disable trim, just a tweak in /etc/fstab right?  But if the above is right wouldn't it still error even if I did?

Cheers.

The question is if it errors because it just says so or because it's still potentially harmful. For a patch that never made it in the latter is less probable I think.

Yes, /etc/fstab with "# notrim" appended to the disk/root mount in question.

In either case getting rid of the message is to rattle the FreeBSD chain via a new bug report in their issue tracker.

We hope to only include patches that FreeBSD itself took into the tree and added to the respective stable version of 13 (for 22.1 anyway). Sometimes that doesn't work, but that is more on the networking side than storage where we have way less experience.


Cheers,
Franco

Hi Franco,

Yer i don't think it's a real issue if you will, just an annoying false positive :).  And yer new bug report or maybe just get that one i linked reopened with the info from this thread.  And yer I wouldn't want to include an opnsense specific patch either, just more to run through regression and hassle, best to get it handled through FreeBSD itself obviously and then wait till the next stable kernel update happens and roll that in some way down the line.

I won't have any time to test further now until possibly middle of next week.  I'll have a go with the notrim option and see what happens after and report back.  I might add a daily cron for the trim cleanup bits in freebsd, just incase, but this is just my home setup and the disk can live without trim really so not strictly necessary to have it on.  If it's still an issue after that I'll get that bug report re-opened or start a new one.

Cheers again.

Thanks, looking forward to your feedback in a bit. :)


Cheers,
Franco