Upgrade to 26.1.3 - my first nightmare with OPNSense

Started by fakemoth, March 08, 2026, 12:11:33 PM

Previous topic - Next topic
There is no known and wide spread problem that will nuke running OPNsense systems when upgrading to 26.1.3.

I run more than a dozen OPNsense systems and while admittedly I have not updated every single one to 26.1.3 yet, for the ones I did I had no problem whatsoever.

Maybe @newsense was a bit less diplomatic as desired but in general the folks helping in this forum are engineers, not diplomats. And that includes myself. I have my own history of being rather blunt, but please assume that I never intended anything ill.

If I read this thread correctly, you had an unsuccessful update to 26.1.3. Well, that happens, occasionally.

But @newsense was perfectly correct in pointing out that the most reasonable way to a running system is:

- install 26.1 from a downloaded image
- restore your saved configuration
- update to 26.1.3 - yes!
- let the system install all missing plugins automatically

That's just how it's supposed to work.

The big argument in this is thread seems to be:

- You - there is some fundamental bug in 26.1.3 and it will break any time I try an update again.
- Everybody else - no, there isn't, and while you had an update fail in a bad way, there is no reason to assume it will fail again on the next try.

There is no known issue with 26.1.3 that will break "Internet". Whatever happened at your first try is particular to your single system. Please just try again from a fresh installation, and the builtin wizard will automatically install all missing plugins. That's how it is supposed to work. See above for the steps I outlined.

There is no mechanism to install plugins for a version that isn't current. Install base, update to latest, then fix plugins. Always worked that way.

Kind regards,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: DiskWizard001 on March 09, 2026, 12:20:45 AMDoes it mean it's random in your eyes ?

It is not random.

It is not a OPNsense issue.

I explained why it happened and how to consistently trigger it if you wish in a vm.

The way to think about this issue to avoid other accidents in the future is this:

QuoteDo not run out of space while the ZFS file system is modified during OS updates


If you would have been lucky enough to run out of space while the updates were downloading you would have been able to recover and avoid total data loss. Once the updates are being installed you're past the point of no return.

March 09, 2026, 02:50:02 AM #17 Last Edit: March 09, 2026, 02:59:24 AM by drosophila
Quote from: newsense on March 08, 2026, 07:36:03 PMThere _is_ a slight chance your ssd might be partially broken, which is usually the hardest to diagnose because all seems to work until it doesn't, however the fact you've been able to reinstall makes me think you're not there.
Granted it's not a proper SSD, but with the thumbdrive I'm using for testing I had a similar failure that would throw me into the single user console on boot, after random errors each subsequent boot (it was not even updating, just booting). I pushed nano back onto it and it booted and upgraded normally (at one point I got the red warning but it vanished mysteriously so I assume it was from upgrading base). Anyway, I know that on this drive there are some cells that can't hold their stuff for more than some weeks but turn up fine when freshly written. A proper SSD should catch this but just like with hdds it may flag them as "maybe unstable" but fail to actually reassign them (had this myself), leaving the issues in place. Examining smartctl -a might give insights into the actual state of degradation of the drive.
Quote from: newsense on March 09, 2026, 02:02:54 AMThe way to think about this issue to avoid other accidents in the future is this:
QuoteDo not run out of space while the ZFS file system is modified during OS updates
Is ZFS somehow worse in this respect than others? If you run out of space while upgrading Linux on ext4, chances are you're stuck with an unbootable system, too, except in Linux you normally have at least one fallback kernel, and also can usually install packages from at least one prior major version. I've read that ZFS doesn't like getting filled but it only becomes slow if it does, not inherently lose data?

Once it's 100% full, a ZFS pool is broken, because you cannot even delete things, anymore. It's copy on write so to delete you need space to write first.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

March 09, 2026, 07:49:25 PM #19 Last Edit: March 09, 2026, 07:58:05 PM by drosophila
Thanks, that is an important fact! Feels like an odd design decision, (more like a major flaw), especially when used as system FS. The lack of recovery tools for ZFS has always made me wary of it, plus that it is complete overkill for a simple system FS. Probably OK for a NAS where you can control the pool usage, but not on a desktop. Boot environments are nice but this... nah. :) Maybe the idea is to add another device to the pool, it should be able to do that even if the pool is 100% full since the journal would not be touched by that?

The -nano image is so much more convenient to use, anyway, big thanks to the devs for making it, doubly appreciated! :)

Monitor your usage? Network management systems and SNMP or telegraf and prometheus or Zabixx or ... exist.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

March 09, 2026, 08:40:08 PM #21 Last Edit: March 09, 2026, 08:45:20 PM by drosophila
Yeah, on the OPN box Monit is set to trigger on 75%. These devices should never run low on space in the first place. Monitoring in general is a good idea, but I still prefer recoverable faults.

Everything that requires discipline to keep working is going to break. :)

Quote from: drosophila on March 09, 2026, 02:50:02 AMIs ZFS somehow worse in this respect than others?

If you run out of space while upgrading Linux on ext4, chances are you're stuck with an unbootable system, too
Quote from: Patrick M. Hausen on March 09, 2026, 07:34:05 AMOnce it's 100% full, a ZFS pool is broken, because you cannot even delete things, anymore. It's copy on write so to delete you need space to write first.
I am confused here : Both OpenZFS and EXT4 for example have Reserved Space that prevents them from getting completely full AFAIK ?!

When you install a random Linux distro it shows 5% Reserved space for EXT4 and can be set back to 1% for large drives/partitions.

For OpenZFS there is this setting : https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#spa-slop-shift
To check your current setting : # cat /sys/module/zfs/parameters/spa_slop_shift

Soo...


Right or Wrong ?! o.O
Weird guy who likes everything Linux and *BSD on PC/Laptop/Tablet/Mobile and funny little ARM based boards :)

Quote from: nero355 on March 09, 2026, 10:16:08 PMI am confused here : Both OpenZFS and EXT4 for example have Reserved Space that prevents them from getting completely full AFAIK ?!

Not to my knowledge. We had multiple incidents of unrepairable pools over in TrueNAS land. Once it's 100% full it's toast.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on March 09, 2026, 10:19:40 PMWe had multiple incidents of unrepairable pools over in TrueNAS land. Once it's 100% full it's toast.
IMHO that should not be possible unless someone did some very weird things with the value mentioned above and set it to 0 somehow ??

What is the default value for OPNsense installs when OpenZFS is chosen ?
Weird guy who likes everything Linux and *BSD on PC/Laptop/Tablet/Mobile and funny little ARM based boards :)

March 09, 2026, 11:13:41 PM #25 Last Edit: March 09, 2026, 11:16:47 PM by Patrick M. Hausen
/sys/module/zfs/... does not exist on FreeBSD.

https://forums.truenas.com/t/zfs-pool-ko-after-filling-at-100/57356/9
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Thank you.  I just took the advice there and reserved 250GiB for the root dataset and children in TrueNAS.
N5105 | 8/250GB | 4xi226-V | Community

The reserved 5% on Linux ext4 is for the system so that ordinary users cannot fill up the drive and thus the OS can still operate and root can install stuff. The problem is that updates always are done by root (who is fully entitled to these 5%), so if root fills these up, be it through an update or otherwise, they're gone. On ext4 this isn't for the FS to remain usable, ext doesn't run into any issue when it fills up. ZFS is the only one I've ever been made aware of having this odd problem. Obviously its benefits outweigh these shortcomings, at least for corporate storage servers. On a data pool, this is easily avoided by having only normal users use it for storage (remote root access always is bad even without this). At least my (XigmaNAS) pool was perfectly fine when I "ran out of space" on it (partial write, reporting "no space left on device"), but it was non-root users only, so the emergency reserve would have been untouched. Obviously, this cannot be avoided with RootOnZFS. It still is odd that, especially given how wasteful ZFS is, it wouldn't just keep a minimum of spare space to itself regardless of who is accessing it so that at least deletes could still be done. Then again, there are so many tunables that might have unintended side-effects which may look like optimizations but end up creating such situations, whereas the defaults are fine?