Print Page - Degraded zpool after failed PSU

Title: Degraded zpool after failed PSU
Post by: FullyBorked on May 22, 2025, 03:24:22 PM

In the second installment of severe weather borking my OPNsense box first installment can be found here (https://forum.opnsense.org/index.php?topic=47332.0) if interested.

I'm not sure if the power issue that took out my PSU also took out one of my SSDs in my ZFS mirror or if I broke this pool when I accidentally disconnected one of these drives during the PSU install. I say the second part because my first boot after the PSU install the system didn't boot, checked my connections and noticed on of my drives was disconnected. That leads me to think this drive already had an issue of some sort.

Regardless below is my current zpool status.

Code Select

zpool status
  pool: zroot
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            ada0p4  FAULTED      0     0     0  corrupted data
            ada0p4  ONLINE       0     0     0

errors: No known data errors

Is there a chance this is reparable with the current disks? Anything I can try before replacing it?
If I need to replace the disk (which looks likely) is the guide linked in that zpool status good enough to get me back online or is there an external guide that might help a zfs noob?

Edit: I also just noticed my device names are identical, shouldn't those be different?

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 22, 2025, 03:53:17 PM

You have two devices ada0p4 or is there a typo?

camcontrol devlist?
gpart show?

Please.

Title: Re: Degraded zpool after failed PSU
Post by: meyergru on May 22, 2025, 03:54:22 PM

Indeed, they should be different. Usually, you could just remove the defective disk from the mirror and then add a new device in.

You should try zpool status -L -P first to see what has happened there. It is probably a risk to remove ada0p4 from the pool, but I have never seen such a thing.

Is /dev/ada1p4 available? Perhaps you can add it first and it will automagically take over from a hot-spare status to replace the faulted device.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 22, 2025, 03:57:39 PM

Quote from: Patrick M. Hausen on May 22, 2025, 03:53:17 PMYou have two devices ada0p4 or is there a typo?

camcontrol devlist?
gpart show?

Please.

Code Select

camcontrol devlist
<SanDisk SSD PLUS 240GB UF4500RL>  at scbus2 target 0 lun 0 (pass0,ada0)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass1)

Code Select

gpart show
=>       40  468877232  ada0  GPT  (224G)
         40     532480     1  efi  (260M)
     532520       1024     2  freebsd-boot  (512K)
     533544        984        - free -  (492K)
     534528   16777216     3  freebsd-swap  (8.0G)
   17311744  451563520     4  freebsd-zfs  (215G)
  468875264       2008        - free -  (1.0M)

There for sure used to be an ada1 before all this. Not sure it's state, it's def connected, but maybe it's totally failed? But that ZFS config seems odd to have the same device twice.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 22, 2025, 04:01:19 PM

Quote from: meyergru on May 22, 2025, 03:54:22 PMIndeed, they should be different. Usually, you could just remove the defective disk from the mirror and then add a new device in.

You should try zpool status -L -P first to see what has happened there. It is probably a risk to remove ada0p4 from the pool, but I have never seen such a thing.

Is /dev/ada1p4 available? Perhaps you can add it first and it will automagically take over from a hot-spare status to replace the faulted device.

Code Select

zpool status -L -P
  pool: zroot
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

        NAME             STATE     READ WRITE CKSUM
        zroot            DEGRADED     0     0     0
          mirror-0       DEGRADED     0     0     0
            /dev/ada0p4  FAULTED      0     0     0  corrupted data
            /dev/ada0p4  ONLINE       0     0     0

errors: No known data errors

Output looks similar, such a bizarre thing. /dev/ada1p4 doesn't look to exist.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 22, 2025, 09:08:38 PM

Try

Code Select

zpool status -g
please.

The idea is to detach the faulted drive using the GUID, then run a scrub to make sure the remaining data is healthy, then power down the system at some convenient time, and watch if the former ada1 is coming back when powering on again. If it isn't you will probably need to replace it. We can help to to re-attach a new disk to the mirror and also copy the partitions necessary to boot from either disk.

Kind regards,
Patrick

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 22, 2025, 09:22:55 PM

Quote from: Patrick M. Hausen on May 22, 2025, 09:08:38 PMTry

Code Select Expand
zpool status -g
please.

The idea is to detach the faulted drive using the GUID, then run a scrub to make sure the remaining data is healthy, then power down the system at some convenient time, and watch if the former ada1 is coming back when powering on again. If it isn't you will probably need to replace it. We can help to to re-attach a new disk to the mirror and also copy the partitions necessary to boot from either disk.

Kind regards,
Patrick

Will do thanks, I just received my replacement PSU so I'll be scheduling some downtime to swap that in, I'll double check to make 100% sure that disk is properly connected and report back once I replace it.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 22, 2025, 09:40:44 PM

The

Code Select

zpool status -g
should as mentioned output a status display like so:

Code Select

	NAME                      STATE     READ WRITE CKSUM
	zroot                     ONLINE       0     0     0
	  16341520380093765778    ONLINE       0     0     0
	    15099387462321339363  ONLINE       0     0     0
	    8131296105030086590   ONLINE       0     0     0

Then you can try:

Code Select

zpool detach zroot <guid of broken one>
zpool scrub zroot

to get back to a consistent state as a first step.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 22, 2025, 10:28:46 PM

Quote from: Patrick M. Hausen on May 22, 2025, 09:40:44 PMThe

Code Select Expand
zpool status -g
should as mentioned output a status display like so:

Code Select Expand
NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 16341520380093765778 ONLINE 0 0 0 15099387462321339363 ONLINE 0 0 0 8131296105030086590 ONLINE 0 0 0
Then you can try:

Code Select Expand
zpool detach zroot <guid of broken one> zpool scrub zroot
to get back to a consistent state as a first step.

I missed that part of your question. Here is my output. So i need to detach the one ending in ...8775? Do I do this before physically doing anything to the disks?

Code Select

zpool status -g
  pool: zroot
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

        NAME                      STATE     READ WRITE CKSUM
        zroot                     DEGRADED     0     0     0
          4730808242311169367     DEGRADED     0     0     0
            4142472898976008775   FAULTED      0     0     0  corrupted data
            15730135158837676855  ONLINE       0     0     0

errors: No known data errors

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 22, 2025, 10:35:13 PM

Quote from: FullyBorked on May 22, 2025, 10:28:46 PMSo i need to detach the one ending in ...8775? Do I do this before physically doing anything to the disks?

Yes, and I would.

No hard guarantees, though - sorry. Have a config backup just in case. I am advising to the best of my knowledge.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 22, 2025, 10:48:55 PM

Quote from: Patrick M. Hausen on May 22, 2025, 10:35:13 PM
Quote from: FullyBorked on May 22, 2025, 10:28:46 PMSo i need to detach the one ending in ...8775? Do I do this before physically doing anything to the disks?

Yes, and I would.

No hard guarantees, though - sorry. Have a config backup just in case. I am advising to the best of my knowledge.

Assume this is just making things cleaner before removal of the failed disk?

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 22, 2025, 10:50:22 PM

Quote from: FullyBorked on May 22, 2025, 10:48:55 PMAssume this is just making things cleaner before removal of the failed disk?

That is my intention, yes. The duplicate device name is ... weird. The GUIDs are ZFS' internal references so they should always be the "source of truth".

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 22, 2025, 10:57:19 PM

Quote from: Patrick M. Hausen on May 22, 2025, 10:50:22 PM
Quote from: FullyBorked on May 22, 2025, 10:48:55 PMAssume this is just making things cleaner before removal of the failed disk?

That is my intention, yes. The duplicate device name is ... weird. The GUIDs are ZFS' internal references so they should always be the "source of truth".

Is it even possible that I somehow created a mirror on the same disk instead of two? 100% my smart monitoring widget had a 0 and a 1 but that doesn't mean the zfs pool did.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 22, 2025, 11:09:47 PM

Nope. Definitely not. Unless your

Code Select

zpool status -g

output lists two identical GUIDs - which I have never never never seen. I'd consider that impossible, but I might be wrong. That would mean something about the pool's internal data structure is severely broken and I would do a config export and reinstall.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 12:08:39 AM

Finally got the PSU replaced with it's permanent replacement and the failed ssd removed and a fresh one installed.

Code Select

camcontrol devlist
<SanDisk SSD PLUS 240GB UF4500RL>  at scbus0 target 0 lun 0 (pass0,ada0)
<SanDisk SSD PLUS 240GB UF4500RL>  at scbus2 target 0 lun 0 (pass1,ada1)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass2)

The detach and scrub was successful, here is the pools current state.

Code Select

zpool status -g
  pool: zroot
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:05 with 0 errors on Fri May 23 17:33:06 2025
config:

        NAME                   STATE     READ WRITE CKSUM
        zroot                  ONLINE       0     0     0
          4730808242311169367  ONLINE       0     0     0

errors: No known data errors

Would someone mind helping me understand how to add the replacment disk into the mirror? Assuming the "replace" command won't work now since we removed the failed disk.

Title: Re: Degraded zpool after failed PSU
Post by: cookiemonster on May 24, 2025, 02:13:50 AM

Code Select

#zpool attach {pool name} {new disk} but from your current setup of one disk, its ID is 4730808242311169367, so it has to be the new one.
So you could do a

Code Select

$zpool status again to show the disks in /dev/adaXX (in your case) to identify the current one; then use the new one from dmesg.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 10:56:21 AM

Please post a

Code Select

zpool status
gpart show

Kind regards,
Patrick

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 03:06:39 PM

Quote from: Patrick M. Hausen on May 24, 2025, 10:56:21 AMPlease post a
Code Select Expand
zpool status gpart show
Kind regards,
Patrick

Code Select

zpool status
  pool: zroot
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:05 with 0 errors on Fri May 23 17:33:06 2025
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          ada1p4    ONLINE       0     0     0

errors: No known data errors

Code Select

gpart show
=>       40  468877232  ada1  GPT  (224G)
         40     532480     1  efi  (260M)
     532520       1024     2  freebsd-boot  (512K)
     533544        984        - free -  (492K)
     534528   16777216     3  freebsd-swap  (8.0G)
   17311744  451563520     4  freebsd-zfs  (215G)
  468875264       2008        - free -  (1.0M)

Weird the device name changed again, now it's ada1p4 vs ada0p4 when it was broken. I'm so confused by that naming.

Title: Re: Degraded zpool after failed PSU
Post by: meyergru on May 24, 2025, 03:20:45 PM

The device names are just numbered as devices are being detected, this is why they are unreliable. It looks like ada0 is present, but the disk is not yet partitioned, hence this is why it does not come up with "gpart show".

Given that your old device is now ada1, there must be an ada0 device. You can make sure using "camcontrol devlist".

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 03:23:43 PM

Quote from: meyergru on May 24, 2025, 03:20:45 PMThe device names are just numbered as devices are being detected, this is why they are unreliable. It looks like ada0 is present, but the disk is not yet partitioned, henc is why it does not come up with "gpart show".

Given that your old device is now ada1, there must be an ada0 device. You can make sure using "camcontrol devlist".

Code Select

<SanDisk SSD PLUS 240GB UF4500RL>  at scbus0 target 0 lun 0 (pass0,ada0)
<SanDisk SSD PLUS 240GB UF4500RL>  at scbus2 target 0 lun 0 (pass1,ada1)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass2)

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 03:26:05 PM

So I just need to run this?

Code Select

zpool attach zroot ada1

Title: Re: Degraded zpool after failed PSU
Post by: meyergru on May 24, 2025, 03:34:04 PM

No. Because the disks are just enumerated, your old disk can well have been ada0 before, but is now detected after the new disk, thus this new disk takes the name ada0 now and the old disk now has ada1.

Therefore, you have to attach ada0, not ada1 now.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 03:40:45 PM

Quote from: meyergru on May 24, 2025, 03:34:04 PMNo. Because the disks are just enumerated, your old disk can well have been ada0 before, but is now detected after the new disk, thus this new disk takes the name ada0 now and the old disk now has ada1.

Therefore, you have to attach ada0, not ada1 now.

That's confusing as all heck. Alright let me see what it does, I guess it won't let me attach one that's already attached anyway.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 03:44:32 PM

Code Select

zpool attach zroot ada0
missing <new_device> specification
usage:
        attach [-fsw] [-o property=value] <pool> <device> <new-device>

I'm not sure what this is looking for, does it need this?

Code Select

zpool attach zroot ada1 ada0

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 03:57:43 PM

So this seemed to work:

Code Select

zpool attach zroot ada1p4 ada0

Shows resilvering now, but the naming looks weird:

Code Select

zpool status
  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat May 24 09:55:56 2025
        137G / 137G scanned, 3.30G / 137G issued at 113M/s
        3.34G resilvered, 2.41% done, 00:20:13 to go
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada1p4  ONLINE       0     0     0
            ada0    ONLINE       0     0     0  (resilvering)

errors: No known data errors

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 04:07:18 PM

Nooooo!

That's exactly how not to do it!

Your system will not be able to boot, when ada1 fails.

Why can't you guys wait for me to react to the post with the info I asked for. Good grief!

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 04:13:36 PM

Quote from: Patrick M. Hausen on May 24, 2025, 04:07:18 PMNooooo!

That's exactly how not to do it!

Your system will not be able to boot, when ada1 fails.

Why can't you guys wait for me to react to the post with the info I asked for. Good grief!

Ugh, sorry, EVERY guide I've read showed that was the process to rebuild.

I"m disappointed in ZFS, thought it was going to be a huge value add. But I'm really thinking about taking all my stuff back to using a classic raid controller. ZFS documentation is poor and confusing and covered in trip wires and mines.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 04:19:58 PM

Quote from: FullyBorked on May 24, 2025, 04:13:36 PMUgh, sorry, EVERY guide I've read showed that was the process to rebuild.

I"m disappointed in ZFS, thought it was going to be a huge value add. But I'm really thinking about taking all my stuff back to using a classic raid controller. ZFS documentation is poor and confusing and covered in trip wires and mines.

No FreeBSD specific guide recommends using ZFS on an entire disk without a partition table!

ZFS is the best thing since sliced bread, the most robust file system existing and you just need to follow proper procedures. This involves respecting the FreeBSD boot process and the specific partition setup necessary.

I already wrote out the whole procedure but somehow that post ist lost. Seems like the forum does not like working in two tabs in parallel.

Give me a couple of minutes, I'll write it again.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 04:38:59 PM

Quote from: FullyBorked on May 24, 2025, 03:06:39 PM
Code Select Expand
zpool status pool: zroot state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0B in 00:08:05 with 0 errors on Fri May 23 17:33:06 2025 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 ada1p4 ONLINE 0 0 0 errors: No known data errors
Code Select Expand
gpart show => 40 468877232 ada1 GPT (224G) 40 532480 1 efi (260M) 532520 1024 2 freebsd-boot (512K) 533544 984 - free - (492K) 534528 16777216 3 freebsd-swap (8.0G) 17311744 451563520 4 freebsd-zfs (215G) 468875264 2008 - free - (1.0M)
Weird the device name changed again, now it's ada1p4 vs ada0p4 when it was broken. I'm so confused by that naming.

OK. This is the state that I wanted and with which I intended to guide you through the recovery process step by step. Now that your zpool is a bit messed up let's fix that first.

I assume

Code Select

zpool status

results in "ada1p4" and "ada0" as the mirror disks?

We need to remove that ada0:

Code Select

zpool detach zroot ada0

Now we take a breath and grab a coffee ... about those device names ...

FreeBSD enumerates the devices by some "hardware order" inherent in the drive, the PCIe bus, whatnot. Starting with 0.

So initially you had ada0 and ada1. Fine. Then ada0 failed. You removed it and rebooted. With only a single drive now present what was formerly ada1 is now ada0. It starts at 0. Always.

Then you inserted a factory new drive in the "first" (whatever that means) hardware position. After another boot that one is now "first" and becomes ada0 and what was initially ada1, then ada0, is now ada1 again.

FreeBSD just counts.

Now the boot process. For a PC system to be able to boot there needs to be a partition table and either - depending on the system - legacy ("BIOS") or EFI boot code in a matching partiton. When you install stock FreeBSD you can pick which to install. OPNsense installs both, just so not to bother the user with questions they cannot answer and always be able to boot, even if you replace your hardware and move your drive from e.g. a legacy system to an EFI system.

You can see that in your "gpart show" output. An EFI partition followed by a freebsd-boot (legacy) partition. Followed by swap and ZFS. ZFS must go into a partition of type freebsd-zfs, never to the whole disk.

You need the "boot thingies" on both disks, because you want to be able to boot of either of them in case one fails.

So now if that removal of ada0 succeeded first we create a partition table. The easiest way in case of identical drives is to copy it from the good one to the new one:

Code Select

gpart backup ada1 | gpart restore ada0

Should the "new" drive not be entirely new and the above command fail because gpart does check if there is a partition table present, already, you can add the "-F" flag to that "gpart restore" command. It's just a reasonable safety measure. But since your new drive never had a partition table it should go well without "-F".

You can then check with

Code Select

gpart show

that now both drives are partitioned the same.

Now that we have a ZFS partition to keep our zpool data we can attach that to the mirror:

Code Select

zpool attach zroot ada1p4 ada0p4

Didn't it appear odd to have "ada1p4" but just "ada0" without a partitin when you did it the first time?

Anyway the zpool should now be resilvering and be done in no time as you can check with

Code Select

zpool status

again.

Good? Next step, copy that boot code.

We copy both the EFI and the legacy partitions from ada1 to their respective counterparts on ada0:

Code Select

# copy EFI boot
dd if=/dev/ada1p1 of=/dev/ada0p1 bs=1m

# copy legacy boot
dd if=/dev/ada1p2 of=/dev/ada0p2 bs=1m

That's it. Grab a beer. You have a redundant bootable system again. If you want redundant swap, too, which I recommend, we can do that in another round after your system is healthy again.

Kind regards,
Patrick

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 04:51:51 PM

Quote from: Patrick M. Hausen on May 24, 2025, 04:38:59 PM
Quote from: FullyBorked on May 24, 2025, 03:06:39 PM
Code Select Expand
zpool status pool: zroot state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0B in 00:08:05 with 0 errors on Fri May 23 17:33:06 2025 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 ada1p4 ONLINE 0 0 0 errors: No known data errors
Code Select Expand
gpart show => 40 468877232 ada1 GPT (224G) 40 532480 1 efi (260M) 532520 1024 2 freebsd-boot (512K) 533544 984 - free - (492K) 534528 16777216 3 freebsd-swap (8.0G) 17311744 451563520 4 freebsd-zfs (215G) 468875264 2008 - free - (1.0M)
Weird the device name changed again, now it's ada1p4 vs ada0p4 when it was broken. I'm so confused by that naming.

OK. This is the state that I wanted and with which I intended to guide you through the recovery process step by step. Now that your zpool is a bit messed up let's fix that first.

I assume
Code Select Expand
zpool statusresults in "ada1p4" and "ada0" as the mirror disks?

We need to remove that ada0:
Code Select Expand
zpool detach zroot ada0

Now we take a breath and grab a coffee ... about those device names ...

FreeBSD enumerates the devices by some "hardware order" inherent in the drive, the PCIe bus, whatnot. Starting with 0.

So initially you had ada0 and ada1. Fine. Then ada0 failed. You removed it and rebooted. With only a single drive now present what was formerly ada1 is now ada0. It starts at 0. Always.

Then you inserted a factory new drive in the "first" (whatever that means) hardware position. After another boot that one is now "first" and becomes ada0 and what was initially ada1, then ada0, is now ada1 again.

FreeBSD just counts.

Now the boot process. For a PC system to be able to boot there needs to be a partition table and either - depending on the system - legacy ("BIOS") or EFI boot code in a matching partiton. When you install stock FreeBSD you can pick which to install. OPNsense installs both, just so not to bother the user with questions they cannot answer and always be able to boot, even if you replace your hardware and move your drive from e.g. a legacy system to am EFI system.

You can see that in your "gpart show" output. An EFI partition followed by a freebsd-boot (legacy) partition. Followed by swap and ZFS. ZFS must go into a partition of type freebsd-zfs, never to the whole disk.

You need the "boot thingies" on both disks, because you want to be able to boot of either of them in case one fails.

So now if that removal of ada0 succeeded first we create a partition table. The easiest way in case of identical drives is to copy it from the good one to the new one:
Code Select Expand
gpart backup ada1 | gpart restore ada0
Should the "new" drive not be entirely new and the above command fail because gpart does check if there is a partition table present, already, you can add the "-F" flag to that "gpart restore" command. It's just a reasonable safety measure. But since your new drive never had a partition table it should go well without "-F".

You can then check with
Code Select Expand
gpart showthat now both drives are partitioned the same.

Now that we have a ZFS partition to keep or zpool data we can attach that to the mirror:
Code Select Expand
zpool attach zroot ada1p4 ada0p4
Didn't it appear odd to have "ada1p4" but just "ada0" without a partitin when you did it the first time?

Anyway the zpool should now be resilvering and be done in no time as you can check with
Code Select Expand
zpool statusagain.

Good? Next step, copy that boot code.

We copy both the EFI and the legacy partitions from ada1 to their respective counterparts on ada0:
Code Select Expand
# copy EFI boot dd if=/dev/da1p1 of=/dev/ada0p1 bs=1m # copy legacy boot dd if=/dev/da1p2 of=/dev/ada0p2 bs=1m

That's it. Grab a beer. You have a redundant bootable system again. If you want redundant swap, too, which I recommend, we can do that in another round after your system is healthy again.

Kind regards,
Patrick

ok, resilvering again, hopefully correctly this time.

Code Select

zpool status
  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat May 24 10:50:06 2025
        137G / 137G scanned, 2.20G / 137G issued at 119M/s
        2.24G resilvered, 1.61% done, 00:19:21 to go
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada1p4  ONLINE       0     0     0
            ada0p4  ONLINE       0     0     0  (resilvering)

errors: No known data errors

Once that finish (assume I should wait till resilver has finished) I'll copy the boot stuff.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 04:57:44 PM

As you like. Of course resilvering and copying in parallel will slow both processss down, but the boot partitions are tiny, so no harm will come from that.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 05:03:07 PM

Is the source wrong here dd if=/dev/da1p1 of=/dev/ada0p1 bs=1m? I get "dd: /dev/da1p1: No such file or directory". Assuming source should be /dev/ada1p1 but don't want to assume again.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 05:09:45 PM

ada1p1, yes. Sorry.

Same for the second command. I fixed it in my original post.

I suggest you just delete that single full quote of my instructions from your first reply, then the thread will be both correct and readable.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 05:14:20 PM

Quote from: Patrick M. Hausen on May 24, 2025, 05:09:45 PMada1p1, yes. Sorry.

ok, copies are done. Waiting on resilver to finish, estimated to be about an hour.

Appreciate the detailed guidance. If you have a way to accept it, I'd be happy to buy you a beer or coffee/tea depending on your elixir of choice for your trouble.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 05:23:43 PM

All good. I'd be interested into those guides you mentioned. If there is misleading documentation out there, we ought to do something about that.

EDIT: I just found this one:

https://docs.freebsd.org/en/books/handbook/zfs/

Ouch! Ouch, ouch, ouch! I'll address this in the next ZFS production users call on Wednesday.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 05:32:41 PM

Quote from: Patrick M. Hausen on May 24, 2025, 05:23:43 PMAll good. I'd be interested into those guides you mentioned. If there is misleading documentation out there, we ought to do something about that.

https://sotechdesign.com.au/how-to-add-a-drive-to-a-zfs-mirror/

https://docs.oracle.com/cd/E53394_01/html/E54801/gayrd.html

https://askubuntu.com/questions/1301828/extend-existing-single-disk-zfs-with-a-mirror-without-formating-the-existing-hdd

https://www.devroom.io/2024/03/07/zfs-upgrade-single-disk-to-mirror/

I can go on and on, the issue is I didn't look up a freebsd specific guide. Figured ZFS was ZFS and a mirror is a mirror. Ultimately I didn't know what I didn't know, so I guess my search was flawed. But everything seemed to mostly agree, so figured that was the right path, and in my head a mirror is well... a mirror. I think in raid controller, on an old school RAID card I'd add in my disk, add it to the mirror, let it resilver/sync, go about my life and never think of it again.

Edit: I understand why this wouldn't have worked now, with a RAID controller, the OS and UEFI bootloader just points at the card instead of disks directly unlike ZFS. I know know that boot info would have to exist on both disks in this instance. IF it was just for storage I assume all that wouldn't be needed and the above guides would have been accurate.

Title: Re: Degraded zpool after failed PSU
Post by: Patrick M. Hausen on May 24, 2025, 05:40:45 PM

Oracle ZFS docs apply to Oracle hardware (former Sun) only, neither Linux nor FreeBSD.

FreeBSD uses GPT partitions. As do Ubuntu and Debian, if you follow the guide by zfsbootmenu.org. If you follow the OpenZFS guide for Ubuntu, it's /dev/disk/by-id.

FreeNAS and TrueNAS use GUUIDs. It's complicated.

Now the fact that the FreeBSD handbook is so outdated it is blatantly wrong - does not match in any way what the FreeBSD installer will (correctly) do - needs to be addressed. I'll poke some people.

Title: Re: Degraded zpool after failed PSU
Post by: FullyBorked on May 24, 2025, 05:43:26 PM

Quote from: Patrick M. Hausen on May 24, 2025, 05:40:45 PMOracle ZFS docs apply to Oracle hardware (former Sun) only, neither Linux nor FreeBSD.

FreeBSD uses GPT partitions. As do Ubuntu and Debian, if you follow the guide by zfsbootmenu.org. If you follow the OpenZFS guide for Ubuntu, it's /dev/disk/by-id.

FreeNAS and TrueNAS use GUUIDs. It's complicated.

Now the fact that the FreeBSD handbook is so outdated it is blatantly wrong - does not match in any way what the FreeBSD installer will (correctly) do - needs to be addressed. I'll poke some people.

Confusing is the big key word here, I'm a slight noob with FreeBSD and a full on noob with ZFS I've just been lost this whole time.

Regardless I appreciate the help and guidance here.

OPNsense Forum

English Forums => 25.1, 25.4 Production Series => Topic started by: FullyBorked on May 22, 2025, 03:24:22 PM