[NOOB] How to properly partition (ZFS) on 2 disks

Started by erica.vh, October 26, 2024, 09:13:16 PM

Previous topic - Next topic
October 26, 2024, 09:13:16 PM Last Edit: October 27, 2024, 07:52:50 PM by erica.vh
Hello people,
using the [NOOB] tag from Marie-Sophie, I think it's a great thing for us newbies to recognize each-other :-p

As a newb, and reading many post in here, I guess I'm gonna need a lot of log and reports to figure out what/where/how
I would like to setup my system on my NVMe (done) and have my logs & Reports on my second disk (cheaper, expandable) but after a dozen tries, I still can't get it to run at boot.

So NVMe is all default:
Zpool is zroot: nda0p1 is efi, nda0p2 is boot, nda0p3 is 8G swap and nda0p4 is system.
While the SATA disk is
ada0p1 is for swap, ada0p2 for /tmp and ada0p3 for /var

First I tried doing the complete install via the installer/opnsense instal menu, but failed every single time.
Then I gave up, installed on NVMe only, and SSH to zpool attach zroot ada0 (all of them) but none of the mount mounted (through fstab)
Then I zpool create -f tank ada0p1 ada0p2 ada0p3
=> Zpool list shoed the two pool (yeah!)
=> And I added the corresponding entries in fstab, but there again, conflicts (/tmp vs /tmp etc)

I'm out of options ...
Any idea, lead, hints ? pretty pleeease

recap: nda0p1 (efi), nda0p2 (boot), nda0p3 swp an vdev "zroot" with all the zvols

ok now for my tentative in details:
For the Swap (just in case I run out of 32G RAM)
gpart add -t freebsd-swap -s 16G ada0
swappoff -a
nano /etc/fstab #current swap; add /dev/ada0p1 none swap
swappon -a

Idea: The initial 8G RAM slice on NVMe will be used as L2ARC/Cash ? (is that even a good idea on a FW/router?)
i.e: zfs create zroot/cash; zfs set quota=7G zroot/cash; zfs set mountpoint=/var/cash zroot/cash

Then for the /tmp and /var:
gpart add -t freebsd-zfs -b 33554500 -s 93GB ada0
ada0p2 added

The vdev "ztank"
zpool create -f ztank /dev/adaop2
zpool list shows zroot and ztank with the respective right size

so far so good, now for the plat de resistance ... moving zvol from zroot to ztank
zfs list (to get the axect name of the current zvol)
zfs create -V 32G ztank/tmp
BUT I don't want to preattribute size, so I skip the -V
zfs create ztank/tmp
zfs create ztank/var/audit
=> Parent doesn't exist, so first: zfs create ztank/var
zfs create stank/var/audit
zfs create stank/var/log
zfs create stank/var/tmp

and now, for the actual transfer ...
zfs create snapshot -r zroot/tmp@relocate
zfs create snapshot -r zroot/var/audit@relocate
zfs create snapshot -r zroot/var/log@relocate
zfs create snapshot -r zroot/var/tmp@relocate
------
zfs send -V zroot/tmp@relocate | zfs receive -Fdu ztank/tmp
zfs send -V zroot/var/audit@relocate | zfs receive -Fdu ztank/var/audit
zfs send -V zroot/var/log@relocate | zfs receive -Fdu ztank/var/log
zfs send -V zroot/var/tmp@relocate | zfs receive -Fdu ztank/var/tmp
-------
now ... I can't put these zroot/xxx offline, I can't rename them, I'm wondering how I will get rid of them ?
if I restart the system they will still be there, in a kind of a mirror or extension

I went this far, but now I'm out of idea, as I can't access the system if I put it offline

That doesn't look like a noob topic to me.
I've had to deal with a zpool import issue under proxmox (my fault) and that was enough.

FWIW, for a much simpler case (just adding a disk, no ZFS), Franco pointed the OP of this thread to FreeBSD docs...
https://forum.opnsense.org/index.php?topic=8979.0

It looks like you managed to move the swap (or add to it).
Are you sure that was wise?
That SATA drive is likely way slower (500MB/s) than the NVME (depends on PCI slot version and lanes too). Even swapping to NVME would likely affect overall performance.
Would it make sense for ZFS to "cache" data using NVME when that data is already stored in that same NVME?
The cache has to be faster that the underlying storage... Swap is really a last resort.

For the rest, it's beyond me. Beyond the use of ZFS for your purpose (is it worth it?), I'm not sure that what you are trying to move over (tmp, var, ...) aligns with existing mount points in the default install.

> Idea: The initial 8G RAM slice on NVMe will be used as L2ARC/Cash ? (is that even a good idea on a FW/router?)
Not a good idea.

https://klarasystems.com/articles/openzfs-all-about-l2arc/
When should I use L2ARC?

For most users, the answer to this question is simple—you shouldn't. The L2ARC needs system RAM to index it—which means that L2ARC comes at the expense of ARC. Since ARC is an order of magnitude or so faster than L2ARC and uses a much better caching algorithm, you need a rather large and hot working set for L2ARC to become worth having.

In general, if you have budget which could be spent either on more RAM or on CACHE vdev devices—buy the RAM! You shouldn't typically consider L2ARC until you've already maxed out the RAM for your system.

October 28, 2024, 12:31:05 AM #4 Last Edit: October 28, 2024, 12:40:01 AM by erica.vh
Quote from: EricPerl on October 27, 2024, 10:10:17 PM
That doesn't look like a noob topic to me.
I've had to deal with a zpool import issue under proxmox (my fault) and that was enough.

FWIW, for a much simpler case (just adding a disk, no ZFS), Franco pointed the OP of this thread to FreeBSD docs...
https://forum.opnsense.org/index.php?topic=8979.0

It looks like you managed to move the swap (or add to it).
Are you sure that was wise?
That SATA drive is likely way slower (500MB/s) than the NVME (depends on PCI slot version and lanes too). Even swapping to NVME would likely affect overall performance.
Would it make sense for ZFS to "cache" data using NVME when that data is already stored in that same NVME?
The cache has to be faster that the underlying storage... Swap is really a last resort.

For the rest, it's beyond me. Beyond the use of ZFS for your purpose (is it worth it?), I'm not sure that what you are trying to move over (tmp, var, ...) aligns with existing mount points in the default install.

Yes, you are right, the SATA is way slower that NVMe, but I'm not after speed, rather preserving the NVMe from heavy R/W as the SATA is way cheaper, and I have 32G RAM, so the use of swap will (should) be very limited to rare occasions
The /tmp though will be much more frequent
And, of course, the multiple log/audit/reports, etc ...

As for the link, it is indeed for a different task, swapping the drive, while my need after moving some zvol to a second drive, I want to find a way to shut the ones on the initial drive . which I can't when the system is up, but if I shut down the system, then I no longer have access to the drive

Quote from: cookiemonster on October 27, 2024, 10:21:21 PM
> Idea: The initial 8G RAM slice on NVMe will be used as L2ARC/Cash ? (is that even a good idea on a FW/router?)
Not a good idea.

https://klarasystems.com/articles/openzfs-all-about-l2arc/
When should I use L2ARC?

For most users, the answer to this question is simple—you shouldn't. The L2ARC needs system RAM to index it—which means that L2ARC comes at the expense of ARC. Since ARC is an order of magnitude or so faster than L2ARC and uses a much better caching algorithm, you need a rather large and hot working set for L2ARC to become worth having.

In general, if you have budget which could be spent either on more RAM or on CACHE vdev devices—buy the RAM! You shouldn't typically consider L2ARC until you've already maxed out the RAM for your system.

Thank you ! so no cashe on NVMe then ! As for RAM, I've maxed out the capacity of the MB, so it will stay at 32G

Yes, no L2ARC. Even on storage systems using ZFS, the benefit is in particular cases. For a firewall, not for now anyway. Meaning, after a lot of users and lots of services start consuming RAM, then might be worth revisiting.

As for the saving of the fast and expensive SSD from constant logging, you should leave the default logging which is to not log default rules and keep the rest for a max of X number of days.
Once you create your own rules, enable logging only for diagnostics.
With this, there will be a very limited amount of wear. Once you find your drive's official endurance, chances are that you'll find you have years of life in it.
Another option if you really must is to log to ram. The obvious downside is that they don't persist power loss. Personally for the reasons above, fail to see why people would chose that, except when using an embedded install.

I guess I should have been more explicit.
Adding a disk is indeed a simple operation, and entirely handled by the underlying distribution (in CLI no less).
In your case, you're trying to change the way FreeBSD mounts FSs and boots.
You might have better chances of getting an answer to your questions on FreeBSD forums.
It's not OPN specific apparently.

Beyond the NVME wear saving measures mentioned by cookiemonster, also consider what it would take to fix that box if the SATA fails. It probably won't boot... Even if you want to swap the drive before it dies, what will it take? Do you have room for another to be able to clone in place? ...

FWIW, here's what I think you're signing up for, whether it's to solve your current issue or fix it later if there are issues. It's all theoretical (based on superficial knowledge and very little experience)...

You'd have to:
* get a bootable USB/CD with a freeBSD version compatible with OPNsense's distribution.
* boot from it
* access OPN's storage (zpool import??)
* For each location you want to "relocate", copy the existing content over to the corresponding partition, mount the partition in place of the current location.
* modify fstab accordingly
* you might have to re-import the zpool in OPN afterwards (prompt during the boot process). I've had to do that once...
On top of it, there are hardlink under var/unbound in my current install. I have no clue how those would follow the steps above or if they would have to be recreated.

There are things that I'm willing to tinker with to help. The above is not one of them (stretching current knowledge)...
A wrong step will likely lead to a non-bootable system. If it's fresh, you can restart from scratch but it's time consuming.

Again, at your own risks!

Finaly !
Thank you to all who helped me, especially @MarieSophieSG and @Claude, over the past two weeks.

Earlier this year, my RS-39 fried (after 241 days on), I bought an RS-41 and plugged "as-is" my RAM + NVMe from the '39 and it worked right away ... but it fried as well (only 32 days on !!)
I shipped them both to HUNSN in June, and they only replaced the RS-41, I'm still arguing with them about the RS-39 (and how weak these proved to be) knowing I have two surge protections, the 390joules from my APC-UPS and an extra 2700joules on top. (and 4 fans !)

Anyway, as I received the refurbished RS-41, I plugged everything back in, and bumped onto data corruption, so I  had to reinstall ... taking that opportunity to download Viper 25.7 and went back to the 2 disks ZFS partitioning I've left hanging in late 2024

I finally got it right !
Here:
# OPNsense Installation Guide - NVMe/SSD Hybrid Setup
# Hardware: HUNSN RS-41 (N6000 + 32GB RAM) + NVMe 256GB + SSD 120GB

# =============================================================================
# PREPARATION: Clean disks completely
# =============================================================================
sudo dd if=/dev/zero of=/dev/nvme0n1 bs=1M count=1
sudo dd if=/dev/zero of=/dev/sda bs=1M count=1  # or ada0, depends on detection

# ============================================================================= 
# INSTALLATION: GUI Method (Recommended, but TUI possible)
# =============================================================================
# Boot OPNsense installer, choose: 'installer+opnsense', Advanced, Manual partitioning

# NVMe Partitioning (nda0):
ESP:  -t efi; -s 400M;   -m /boot/efi; -l efi    nda0p1   # NOT encrypted
Boot: -t freebsd-boot; -s 600K;   -m <empty>;   -l boot   nda0p2   # NOT encrypted 
Root: -t freebsd-zfs; -s <rest>; -m /;         -l zroot  nda0p3   # ENCRYPT flecther4

# SSD Partitioning (ada0):
Swap: -t freebsd-swap; -s 15259M; -m <empty>; -l swap_data ada0p1  # NO encryption option
Srun: -t freebsd-zfs;  -s <rest>; -m <empty>; -l zrun     ada0p2  # ENCRYPT fletcher4

# Complete installation, set root password,
# =============================================================================
# FINAL REBOOT (without USB key) AND TEST
# =============================================================================
# Login as root, go to shell (option 8)

# IMPORTANT: GUI creates pool named "root" even if you specified "zroot"
# Verify current state
zfs list
# Expected: pool "root" with mountpoint=none, but system shows "root" mounted on /
df -h /
# Expected: system shows "root" mounted on /
# This is NORMAL and CORRECT - do NOT change anything for the root pool!

# Create ZFS pool "zrun" on SSD partition
zpool create zrun /dev/gpt/zrun

# Create datasets WITH backup of existing content
# Step 1: Create datasets without mounting first
zfs create zrun/var_log
zfs create zrun/var_tmp

# Step 2: Backup existing content (mainly empty sub-folders) - RECURSIVE
mkdir -p /tmp/backup_var_log /tmp/backup_var_tmp
# Use tar for complete directory backup including subdirectories
tar -cf /tmp/backup_var_log.tar -C /var/log . || true
tar -cf /tmp/backup_var_tmp.tar -C /var/tmp . || true

# Step 3: Set mountpoints (this will automatically mount the datasets)
zfs set mountpoint=/var/log zrun/var_log
zfs set mountpoint=/var/tmp zrun/var_tmp

# Step 4: Restore content - RECURSIVE 
# Extract tar archives to restore complete directory structure
tar -xf /tmp/backup_var_log.tar -C /var/log || true
tar -xf /tmp/backup_var_tmp.tar -C /var/tmp || true

# Step 5: Cleanup
rm -f /tmp/backup_var_log.tar /tmp/backup_var_tmp.tar

# Verify ZFS configuration
zpool status
zfs list
df -h (or df -h /var/log or /var/tmp)

# =============================================================================
# MEMORY OPTIMIZATION: RAM Swap + tmpfs /tmp
# =============================================================================

# Create a 'ram-optmizing' service
ee /usr/local/etc/rc.d/ram_optimize
#!/bin/sh
# PROVIDE: ram_optimize
# REQUIRE: DAEMON
# BEFORE: LOGIN
# KEYWORD: shutdown

. /etc/rc.subr

name="ram_optimize"
rcvar="ram_optimize_enable"
start_cmd="ram_optimize_start"
stop_cmd=":"

ram_optimize_start()
{
    PATH=/sbin:/bin:/usr/sbin:/usr/bin
    export PATH
   
    echo "Setting up RAM swap and tmpfs..."
   
    # Create memory disk AND activate swap (fstab runs too early)
    if /sbin/mdconfig -a -t malloc -s 4G -u 8; then
        echo "Created md8 successfully"
       
        # Manually activate swap since fstab ran before md10 existed
        if /sbin/swapon /dev/md8; then
            echo "RAM swap activated successfully"
        else
            echo "ERROR: Failed to activate RAM swap"
        fi
    else
        echo "ERROR: Failed to create md8"
    fi
   
    # Mount tmpfs on /tmp
    if /sbin/mount -t tmpfs -o size=4G tmpfs /tmp; then
        echo "tmpfs mounted successfully"
    else
        echo "ERROR: Failed to mount tmpfs"
    fi
   
    echo "RAM optimizations complete"
}

load_rc_config $name
run_rc_command "$1"
-=-=-=-=-=-=-=-=-=-=-
# to exit ee, press escape, enter, enter (save)

# Give the service 'ram-optimize' the execute right
chmod +x /usr/local/etc/rc.d/ram_optimize

-=-=-=-=-=-=-=-=-=-=-
# Édit the rc.conf
ee /etc/rc.conf

# Add these lines to /etc/rc.conf:
# Basic OPNsense config
hostname="opnsense.localdomain"
keymap="<your keymap>.kbd"

# ZFS
zfs_enable="YES"

# Enable services
ram_optimize_enable="YES"
opnsense_enable="YES"
-=-=-=-=-=-=-=-=-=-=-
# to exit ee, press escape, enter, enter (save)

-=-=-=-=-=-=-=-=-=-=-
# Edit /etc/fstab 
# RAM swap will be managed by service + fstab priority

ee /etc/fstab

REM # Keep the EFI and ROOT partition line as-is (DO NOT modify)
# STEP 1: MODIFY the existing swap_data line by adding priority:
/dev/gpt/swap_data none swap sw 0  0 # WAS
/dev/gpt/swap_data none swap sw,pri=1 0  0 # Now IS

# STEP 2: ADD this line for RAM swap before existing swap line (md* created by service):
/dev/md8 none swap sw,pri=9 0  0

# Final /etc/fstab should look like:
# /dev/nda0p1   /boot/efi msdosfs rw 0    0 
# /dev/md8   none swap sw,pri=9 0    0    # <- RAM swap (high priority)
# /dev/gpt/swap_data  none swap sw,pri=1 0    0    # <- SSD swap (low priority)

# NOTE: Do NOT add tmpfs line to fstab - it's handled by rc.conf
# NOTE: Do NOT add zrun/var_log or zrun/var_tmp lines to fstab - they're handled by ZFS

# =============================================================================
# VERIFICATION COMMANDS (before reboot)
# =============================================================================
# Check current fstab content
cat /etc/fstab

# Check current rc.conf content 
cat /etc/rc.conf

# Check ZFS pools and datasets
zpool list
zpool status
zfs list

# Check current mount points
df -h
mount | grep zfs

# =============================================================================
# ZFS OPTIMIZATION SETTINGS
# =============================================================================

# Set ZFS properties for better performance
zfs set compression=lz4 zrun
zfs set atime=off zrun
# Note: checksum=fletcher4 is already the default for ZFS pools

# Limit ZFS ARC to 12GB (leaving ~20GB for system on 32GB RAM total)
# Calculation: 32Go (29.8GiB) total - 4GiB swap - 4GiB tmpfs - 3GiB system = ~20Go available
# ARC at 12GiB = ~57% of available RAM (conservative for firewall usage)
echo 'vfs.zfs.arc_max="12884901888"' >> /boot/loader.conf

# Optional L2ARC settings (only needed if you add dedicated cache SSD later)
# These protect SSD from excessive cache writes
echo 'vfs.zfs.l2arc_write_max="134217728"' >> /boot/loader.conf    # 128MB/sec max
echo 'vfs.zfs.l2arc_write_boost="268435456"' >> /boot/loader.conf  # 256MB/sec burst
reboot

# =============================================================================
# POST-REBOOT VERIFICATION
# =============================================================================
# After reboot, login as root, go to shell (option 8) and verify:

# Check swap priorities (should show 2 lines, 4G md8 pri=9, and swap_data)
swapinfo -h

# Check tmpfs /tmp (should show 2 lines, 4GB tmpfs and zroot/tmp)
df -h /tmp
mount | grep tmpfs

# Check md (you should have 2 lines: mdctl and md8)
ls -la /dev/md*

# Check ZFS pools and datasets
zpool status
zfs list
df -h /var/log /var/tmp

# Check memory usage
top
free -h

# =============================================================================
# SECURITY CONFIGURATION: Admin User and Settings
# =============================================================================

# Create dedicated admin user (shell method)
pw useradd -n admin -c "OPNsense Admin" -m -G wheel -s /bin/tcsh
passwd admin
# Enter a strong password when prompted

# Alternative: Create admin user via web interface after initial setup:
# 1. Login to web interface as root
# 2. Go to: System > Access > Users 
# 3. Click "Add" to create new user:
#    - Username: admin
#    - Full name: OPNsense Admin 
#    - Password: (strong password)
#    - Login shell: /bin/tcsh
#    - Group membership: Check "admins"
#    - Authorized keys: (leave empty unless needed)

# Configure secure access in web interface:
# System > Settings > Administration
# - Protocol: HTTPS only
# - TCP Port: 443 (or custom secure port)
# - Disable SSH Access (since no external access needed)
# - Session timeout: Set appropriate value (e.g., 240 minutes)
# - Enable "Disable web GUI redirect rule" for security
# - Disable "Allow console access" for admin user if desired

# Security best practices:
# - Use "admin" account for web interface daily operations
# - Use "root" account only for console emergency access
# - Never enable SSH unless absolutely necessary
# - Regularly update admin password
# - Monitor login attempts in System > Log Files > System

# Test admin user login (web interface)
# - Login with admin/password to web interface
# - Verify admin has proper access to configuration
# - Check System > Access > Users shows both root and admin

# Security verification
# - Confirm SSH is disabled: System > Settings > Administration
# - Verify HTTPS-only access is working
# - Check no unnecessary services are running: Diagnostics > Services

# Check that logs are being written to SSD
ls -la /var/log
tail /var/log/system.log

# =============================================================================
# EXPECTED FINAL STATE
# =============================================================================
# Root filesystem: pool "root" on encrypted NVMe (mountpoint=none but mounted on /)
# Swap: 4GiB RAM (pri=9) + 15GiB SSD (pri=1)
# /tmp: 4GiB tmpfs in RAM
# /var/log: ZFS dataset on encrypted SSD
# /var/tmp: ZFS dataset on encrypted SSD
# Total RAM used for optimization: ~8GiB
# Available RAM for system/ZFS ARC: ~24Go

# =============================================================================
# TROUBLESHOOTING
# =============================================================================
# If swap doesn't work:
# - Check: swapinfo -h
# - Verify: ls -la /dev/md8 /dev/gpt/swap_data (or /dev/nda0p1)

# If tmpfs doesn't work:
# - Check: mount | grep tmpfs
# - Verify rc.conf syntax
# - Try manual: mount -t tmpfs -o size=4G tmpfs /tmp

# If ZFS datasets don't mount:
# - Check: zfs mount -a
# - Verify: zpool import -a
# - Check: zfs get mountpoint zrun/var_log zrun/var_tmp

# =============================================================================
# TROUBLESHOOTING - LESSONS LEARNED
# =============================================================================
# Common issues encountered during development of this guide:

# 1. RC.CONF vs RC.LOCAL vs CUSTOM SERVICE
# - rc.conf mdconfig syntax doesn't work reliably in OPNsense
# - rc.local doesn't execute automatically despite local_enable="YES"
# - Custom service in /usr/local/etc/rc.d/ is the only reliable method

# 2. SWAPON PRIORITY OPTIONS
# - swapon -p option does NOT exist in OPNsense/FreeBSD
# - swapon -F - (stdin) is NOT supported
# - swapon -E file works but has filesystem restrictions
# - SOLUTION: Use service for device creation/activation, fstab for priorities

# 3. BOOT ORDER TIMING
# - fstab processes swap entries BEFORE custom services run
# - Result: fstab tries to swapon /dev/md8 before md8 exists
# - SOLUTION: Service must create md8 AND activate it manually

# 4. FREEBSD SHELL REDIRECTION
# - 2>/dev/null syntax causes "2: not a directory" errors in tcsh
# - FreeBSD service syntax differs from Linux systemd
# - SOLUTION: Remove problematic redirections, use proper FreeBSD rc.subr

# 5. TMPFS vs ZFS FILESYSTEM CONFLICTS 
# - Cannot create fstab temp files on /tmp when /tmp is tmpfs
# - swapon -E requires files on persistent filesystem
# - SOLUTION: Use /var/tmp (ZFS) instead of /tmp (tmpfs) for temp files


I'm glad it finally worked !
And sorry for the HUNSN experience, I'll monitor mine more closely ....
Hunsn RS39 (N5105, 4x i225) 24.7.5_0 testing
LAN1 = swtch1 Laptop1 MX23, NAS, Laptop2 Win10
LAN2 = WiFi router AP, Laptop2, tablet, phone, printer, IoT, etc.
LAN3 = Swtch2 Laptop3 Suse; Laptop4 Qube-OS/Win10, printer
Pretending to be tech Savvy with a HomeLab :-p