How to automount a second zpool ?

Started by ajm, February 12, 2022, 03:14:11 PM

Previous topic - Next topic

Quote from: franco on February 13, 2022, 12:35:46 PM
PS: relevant bit might be "zpool" script https://github.com/opnsense/src/commit/74e2b24f2c369
Thanks franco.
This is an interesting one.
ajm On my OPN I only have the single root pool. It's a very small APU device, I've not added another disk because I don't like the idea or running stripped storage i.e. no redundancy on it, being a firewall with limited cpu and memory resources.
I don't have a way to test for you unfortunately.
But I do have a server based on freebdsd 12 with multiple pools.
One thing I notice but I can't tell if is part of the problem is that my understanding is that on freebsd the zpool import is done by scanning geom devices. On my opn and non-opn geoms listing, all those in a zfs pool have an attribute "zfs::vdev        ZFS::VDEV " and it seems missing from yours on your "geom -t" listing for ada1 where tank is.
My OPN pool:

@OPNsense:~ % geom -t
Geom                 Class      Provider
ada0                 DISK       ada0
  ada0               PART       ada0p1
    ada0p1           LABEL      gpt/efiboot0
      gpt/efiboot0   DEV       
    ada0p1           LABEL      msdosfs/EFISYS
      msdosfs/EFISYS DEV       
    ada0p1           DEV       
  ada0               PART       ada0p2
    ada0p2           LABEL      gpt/gptboot0
      gpt/gptboot0   DEV       
    ada0p2           DEV       
  ada0               PART       ada0p3
    ada0p3           DEV       
    swap             SWAP     
  ada0               PART       ada0p4
    ada0p4           DEV       
    zfs::vdev        ZFS::VDEV
  ada0               DEV   

One of my storage systems:
~]$ geom -t
Geom                Class      Provider
da0                 DISK       da0
  da0               PART       da0p1
    da0p1           LABEL      gpt/sysboot0
      gpt/sysboot0  DEV       
    da0p1           DEV       
  da0               PART       da0p2
    da0p2           LABEL      gpt/swap0
      gpt/swap0     DEV       
      swap          SWAP     
    da0p2           DEV       
  da0               PART       da0p3
    da0p3           LABEL      gpt/sysdisk0
      gpt/sysdisk0  DEV       
      zfs::vdev     ZFS::VDEV
    da0p3           DEV       
  da0               DEV       
da1                 DISK       da1
  da1               PART       da1p1
    da1p1           DEV       
    zfs::vdev       ZFS::VDEV
  da1               DEV       
da2                 DISK       da2
  da2               PART       da2p1
    da2p1           DEV       
    zfs::vdev       ZFS::VDEV
  da2               DEV       
da3                 DISK       da3
  da3               PART       da3p1
    da3p1           DEV       
    zfs::vdev       ZFS::VDEV
  da3               DEV       
da4                 DISK       da4
  da4               PART       da4p1
    da4p1           DEV       
  da4               PART       da4p2
    da4p2           DEV       
  da4               PART       da4p3
    da4p3           DEV       
  da4               DEV       
da5                 DISK       da5
  da5               PART       da5p1
    da5p1           LABEL      gpt/sysboot1
      gpt/sysboot1  DEV       
    da5p1           DEV       
  da5               PART       da5p2
    da5p2           LABEL      gpt/swap1
      gpt/swap1     DEV       
    da5p2           DEV       
  da5               PART       da5p3
    da5p3           LABEL      gpt/sysdisk1
      gpt/sysdisk1  DEV       
    da5p3           DEV       
  da5               DEV       
da6                 DISK       da6
  da6               PART       da6p1
    da6p1           LABEL      gpt/HPE_Disk1
      gpt/HPE_Disk1 DEV       
      zfs::vdev     ZFS::VDEV
    da6p1           DEV       
  da6               DEV       
da7                 DISK       da7
  da7               PART       da7p1
    da7p1           LABEL      gpt/PCK96S7X
      gpt/PCK96S7X  DEV       
      zfs::vdev     ZFS::VDEV
    da7p1           DEV       
  da7               DEV       
da8                 DISK       da8
  da8               PART       da8p1
    da8p1           LABEL      gpt/PCJPJYRX
      gpt/PCJPJYRX  DEV       
      zfs::vdev     ZFS::VDEV
    da8p1           DEV       
  da8               DEV       
da9                 DISK       da9
  da9               PART       da9p1
    da9p1           LABEL      gpt/PCK93TSX
      gpt/PCK93TSX  DEV       
      zfs::vdev     ZFS::VDEV
    da9p1           DEV       
  da9               DEV       
cd0                 DISK       cd0
  cd0               DEV       
gzero               ZERO       gzero
  gzero             DEV   


I think it relates to gpart usage. I'll see if I can dig something out of the command you used.

February 13, 2022, 10:56:52 PM #17 Last Edit: February 13, 2022, 11:11:13 PM by ajm
Yeah, thanks guys.

The info posted previously was a WIP, some of it is 'stale'. I'd already focussed on 'geom -t' and the contents of '/etc|boot/zfs/zpool.cache' is being possible causes, but they seem to check out OK (see below). This was by comparison with a 'working' FreeBSD system.

I think I'll need to dig further into ZFS to make any more progress with it, but work and other stuff will get in the way for a bit. So I've worked-around the issue for now, got the pool 'auto mounting', and got my jails running.

Re. choice of single disk, this is by-design a highly resource-constrained system, where every watt-hour is counted, but the benefits of ZFS over other fs options (mainly COW and all it brings) makes it a no-brainer even on a single SSD. The data will get backed-up.


root@a-fw:~ # geom -t
Geom               Class      Provider
ada0               DISK       ada0
  ada0             PART       ada0p1
    ada0p1         DEV
    ada0p1         LABEL      gpt/gptboot0
      gpt/gptboot0 DEV
  ada0             PART       ada0p2
    ada0p2         DEV
    swap           SWAP
  ada0             PART       ada0p3
    zfs::vdev      ZFS::VDEV
    ada0p3         DEV
  ada0             DEV
ada1               DISK       ada1
  ada1             PART       ada1p1
    ada1p1         DEV
    ada1p1         LABEL      gpt/tank
      zfs::vdev    ZFS::VDEV
      gpt/tank     DEV
  ada1             DEV

root@a-fw:~ # zdb -U /etc/zfs/zpool.cache
tank:
    version: 5000
    name: 'tank'
    state: 0
    txg: 2157
    pool_guid: 3111308251436133108
    errata: 0
    hostid: 3119175440
    hostname: 'a-fw.<fqdn redacted>'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 3111308251436133108
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12494104300729996690
            path: '/dev/gpt/tank'
            whole_disk: 1
            metaslab_array: 256
            metaslab_shift: 34
            ashift: 12
            asize: 2000394125312
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 129
            com.delphix:vdev_zap_top: 130
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
zroot:
    version: 5000
    name: 'zroot'
    state: 0
    txg: 136118
    pool_guid: 11119205119676167574
    errata: 0
    hostname: ''
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 11119205119676167574
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 11612196972070245027
            path: '/dev/ada0p3'
            whole_disk: 1
            metaslab_array: 256
            metaslab_shift: 29
            ashift: 12
            asize: 13414957056
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 130
            com.delphix:vdev_zap_top: 131
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

February 13, 2022, 11:09:58 PM #18 Last Edit: February 14, 2022, 09:58:14 AM by ajm
One thing struck me about zpool.cache is the 'working' pool (zroot) metadata reported by 'zdb -U' has a null hostname, wheras the non-working pool (tank) has a FQDN. I was wondering at what point in the boot the system knows its FQDN, if  thats after ZFS loads, maybe that would explain why ZFS is failing to mount the 2nd pool. Clutching at straws again.. !

After poking at ZFS-based images I think that zfs-import is probably something that coulb be run, but under the assumption that all available pools are actually meant to be auto-loaded. It's like ignoring /etc/fstab and just probing all devices for something to load rather than what was being configured.

I'm not sure what's better.


Cheers,
Franco

Thx, I'm well aware my workaround is a bit hacky.

However as its likely this will be the only OPN host I'll want to do this on, I can't really justify looking any closer at it right now as there's so many other things still 'to-do'.

Well, I'm just trying t say if there is a per-pool setting for auto-mount maybe that would be the nicest thing to probe and import on demand? :)


Cheers,
Franco

February 21, 2022, 07:57:39 PM #22 Last Edit: February 21, 2022, 08:04:19 PM by ajm
Hmmm... 'If'....

TBH when I was looking at this the other day, I got the impression I'd reached the limit of readily-available documentation and knowledge, and that to go any further, I would risk plunging down a rabbit-hole that I really didn't need to.

I'd love to help improve OPNsense by taking an initiative on 'issues' I encounter, but sometimes you have to say to yourself, 'Lifes too short for that !' (I've just turned 59 :( )


PS. Woohoo ! I'm now a 'junior member' ! I wish I felt a bit junior than I do ;)

You're doing good, don't worry. :)

Ok so back to "canmount" idea I found:

https://docs.oracle.com/cd/E19253-01/819-5461/gdrcf/index.html

# zpool import -aN
# zfs mount -va

The -N for import would be important to avoid mounting something we shouldn't. The theory would be we could get the canmount datasets to mount this way.

What do you think?

February 21, 2022, 08:22:36 PM #24 Last Edit: February 21, 2022, 08:29:36 PM by ajm
Ok, so I may be tempted to have another look..

BTW:

Karma: 1024 = 1 KiloKarma !

Nice !

PS. I confess that as a long-term Sun admin during the 90's-00's, I have an almost biological inability to have anything to do with O*****. So I prefer to use docs from the Open* world :)

Quote from: franco on February 21, 2022, 08:17:58 PM
You're doing good, don't worry. :)

Ok so back to "canmount" idea I found:

https://docs.oracle.com/cd/E19253-01/819-5461/gdrcf/index.html

# zpool import -aN
# zfs mount -va

The -N for import would be important to avoid mounting something we shouldn't. The theory would be we could get the canmount datasets to mount this way.

What do you think?
Good find. I use another stripped-down freebsd distro for my storage and all pools get mounted without problems on reboots. This  is why this one got me scratching my head. I'll ask their devs if they could tell me what mechanism is used. I'll post when I hear.

@franco the pool and datasets import is here https://sourceforge.net/p/xigmanas/code/HEAD/tree/trunk/etc/rc.d/zfs (BSD license). So it's very similar approach but it does like you wondered, use -f to force.

Thanks, a bit strange it does import/mount by force then umount again. It's more or less import -N then... let's try that in our code.


Cheers,
Franco

https://github.com/opnsense/core/commit/51bdcb64ac

# opnsense-patch 51bdcb64ac

Let me know how that works for you.


Cheers,
Franco

February 24, 2022, 05:18:55 PM #29 Last Edit: February 24, 2022, 05:23:26 PM by ajm
Thanks ! To test this patch, I took the following actions:

1. Commented out the 'zpool import tank' in my syshook script.
2. Created new BE, rebooted, applied the patch to the new BE, rebooted.

I noted the following message during boot:


Mounting filesystems...
cannot import 'tank': pool was previously in use from another system.
Last accessed by a-fw.<domain redacted> (hostid=b9ead710) at Thu Feb 24 15:46:17 20                                                                             22
The pool can be imported, use 'zpool import -f' to import the pool.


I then just tried a manual import, without -f, this succeeded.

So it still seems to me that problem is due to pool being recognised as being from 'another system', although it is not. It's as if the host doesn't know its FQDN at the point the initial 'zpool import' happens.

Puzzlingly, although 'zdb -U /boot/zfs/zpool.cache' includes the hostname:, neither 'zpool get all' nor 'zfs get all' do (nor do they list the hostid), so I don't know where that's being stored on the pool.