"Danger. Unexpected error, check log for details" during 25.7.6 upgrade

Started by jonm, October 23, 2025, 05:18:07 PM

Previous topic - Next topic
The 403 is expected, that always happens when the webgui is restarted: you're forcefully logged out, all access is rightly denied. Just go back to the index page, log in and done, like you did, nothing to be concerned about. You could run an audit. If this finds any missing or corrupt files, it should correct or at least report them, unless they're config files which probably aren't being checked (but a check would be possible for presence and syntax at least; semantics would also be possible, like if everything required is there and nothing extraneous, and even if the contents make sense, like if you have DHCP and a static IP configured for the same interface (unless that's a supported feature of course)).
So, everyone for whom the box came back from a 404, did you perchance look at the CPU usage? I had waited 30 minutes and the CPU was 100% idle before I killed it. The problem is that I'm using a thumbdrive so I have no indication of disk activity, and also it is very slow. The previous update took 2 hours to complete, and the CPU was mostly idle but not entirely (between 85% and 96% idle), but never 100% so I knew it hadn't died. I may try the update again and just leave it sit overnight in case it actually is just having a coffee break or something. ;)

I am on the -nano image, maybe there's a connection between working and failing and partially failing updates there?

I had what seems to be the same problem while upgrading from 25.7.5: error popup on the update screen while installing the upgrade, 404 when I reloaded the update page, 403 from the root URL (which should've given me either the dashboard or the login page).  But it came back after a few minutes.

Since earlier posts here have expressed concern about whether the core package might be broken by this upgrade, I downloaded a config backup and the latest installer as a precaution, then rebooted my router as a test.  It started up successfully and seems to be working as usual.


You simply wait, as I've found out:
it seems like the upgrade indeed was just taking a coffee break. I've restored and then re-run the update. The GUI stopped responding at the exact same spot, but since this time I had opened a root shell, I was able to see that it indeed was and still is doing stuff. It's literally crawling around at about 0.1MB/s, mostly stuck waiting on block I/O. In /var/log/pkg/pkg<xxxxxxx>.log I can see that it still is making progress, so even though the GUI still is 404, it's doing its thing. Obviously, disk speed is the abcolutely determining factor here, on fast drives you might not even notice the outage when half the root fs isn't there, while on slower drives it'll be more likely to hit the window of opportunity. Having seen all that I'm pretty confident that even my snail will come back to life eventually. I'll need another storage device since I can't have day-long outages every couple weeks once I actually deploy it (well, I can if the core function stays active). But this trickle-writes probably are going to kill the thumbdrive sooner rather than later, even with /var and /tmp mounted as tmpfs as they are in -nano.
Reveal: I'm used to XigmaNAS (embedded), which does its updates as one large file in a couple of minutes at most, and otherwise runs completely in memory, so I sort of expected the same from -nano, especially given that bad actors might somehow manage to write to the filesystem. With a memory-only fs, it'll just be one reboot away from a clean system. OPNsense seems to be more hdd-centric with its package system, I'll need to adjust to this.

i have a fast nvme drive, and i think my update was broken for other reasons.  i followed https://forum.opnsense.org/index.php?topic=49437.15 to fully recover.

By chance I checked my still open shell and saw "pkg-static 92615 - [meta sequenceId="1"] opnsense-25.7.6 installed". Reloading the also still open browser window indeed brought back the dashboard like nothing ever happened. It's still not done yet but probably will end up finishing in due time. Apparently I wasn't patient enough last time.

BTW: it's a bit bothersome that one doesn't seem to be able to install missing plugins when an update is pending, at least it said I need to update first. Since I had recovered using a vanilla download, my reloaded config made it miss the plugins (it's working though). I'd have preferred to install them before retrying the update, though they're not essential.

Edit: it's indeed gone through and now is fully updated. However, during my restoring, I found that console settings aren't restored: nano has primary console set to "serial" and secondary to "VGA" by default, which I had changed around to "VGA" primary and "serial" secondary. The restore seemed to restore everything but not this. It's readily reproducible, just store the config, change the setting, apply, restore and reboot: the stored setting will not be restored.

Hey guys, thank you so much for your posts here - you really saved hours of my lifetime!
Yesterday evening, I started the update and was presented the scaring "danger" message followed by a "403" on the GUI and a "sh: /usr/local/libexec/opnsense-auth not found" on the console.
Luckily, I was able to do a failover to a standby image, so I only had to wait for some time until failover was done and had a working OPNsense again..

Today, I tried again the update, with same results. But having read your posts, I decided to simply go for a coffee and practise patience ...
After a while OPNsense rebooted, was updated and recoverd to working mode.

This was the first time the otherwise reliable update process made me sweat! But thanks to your help, everything turned out well in the end! Thank you so much!

So, I've been waiting for about 2 days now.. and the UI still hasn't come back. Worse, none of my root and user passwords are working over SSH. Tailscale SSH refuses to connect. It seems all credentials on the machine have changed or been corrupted. I can connect and get to the login prompt, but haven't been able to find a working user/password combination that will let me in. Not even when I open a shell on my server via tailscale and then try to log into the firewall from the LAN interface. This is a remote machine and now I'll have to drive 2.5hrs each direction and spend about $100 in gas, just to go default/reinstall the machine.

Any other ideas how to get the firewall back up?

Same problem here. I upgraded my secondary cluster node to 25.7.6 and the update broke everything.
I had to recover the node as described in the other thread, but now a lot of things seem to be broken. For example, Firmware > Plugins does no longer list available plugins and pkg segfaults every single run. I assume the node is lost and needs to be reinstalled.

For testing I upgraded a different vm based on an older snapshot and it also broke with the same issue. I wonder why a obviously broken update is not reverted and still available for download.

To be frank, the issue has been reported a number of times, but it's not large scale issue and there is not enough data to make a decision.

The package manager update is also pretty much irreversible from a release and build perspective -- and eventually unavoidable. Going back and forth is not going to solve anything other than adding a lot of unnecessary work.

I'm going to dig through the changes from version 1.9 to 2.3 to find the change that causes the package deinstall/install reordering. I can't believe pkg-base in FreeBSD 15.0 won't suffer from the same type of reordering removing vital packages for a number of minutes which makes recovery pretty hard.


Cheers,
Franco

I had the "Danger" message on one system of half a dozen during my update cycle, but no ill came from it. Just reload, check for updates again, reboot - all good.

Sorry I cannot collect any evidence, now.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Well, the issue is pretty simple: pkg now removes the OPNsense core package with all its web server related files... goes on to update other packages.. and eventually circles back to installing the core package back.

That's dangerous for at least three reasons:

1. The core package is a "vital" package as per pkg's own design. Removing it for a prolonged amount of time is just not a good idea. Deleting all files for an immediate reinstall wasn't ideal either, but this is a new level that probably has implications on pkg-base use as well.

2. If pkg should fail for any benign reason during other package's updates it will stop and not put the core package back. In that case it will also not be able to reference the OPNsense repository because it was also deinstalled (see point 1).

3. Typical failsafe mechanisms in UNIX commands are the users. If a tool screws up the user is expected to recover, but being fully automated we can only recover from a scenario where recovery files are still in place and recovery is pre-scripted.

It may require substantial core changes in the near future, but double-checking pkg's changes seems like a quicker route to success.

If not, we will probably start working on an alternative to pkg-based upgrades for what we believe should be critically safe upgrades.


Cheers,
Franco

I think the main issue is not the re-install of the package, but the segfaults created by pkg.
When I manually install the package after the update, pkg segfaults during the post-install steps. Here is a picture of my current pkg status.

I think what's happening is pkg gets upgraded, breaks and is unable to continue with the remaining steps.
My firewall-01 without update has pkg version 1.19.2

Hello everybody,
I had the message on 2 of my 3 OPNSense sysyems.
The only difference between them is that the two with the danger message had running:
- one OpenVPN Legacy server
- one OpenVPN Instance

while the one without the danger message has only the new OpenVPN Instance running.

All three systems upgraded succesfully and are working without problems as far a s I can see.

Don't know if it's related but hope this can help.

Best Wishes

Our pkg 2.3.1 doesn't segfault but FreeBSD's version will, see https://github.com/opnsense/pkg/commit/b93bfd925b7

So I'm looking at https://github.com/freebsd/pkg/commit/523caa97c9 which was only committed last week upstream.

The preliminary results:

[4/136] Deinstalling opnsense-25.7...
[4/136] Deleting files for opnsense-25.7: .......... done
[...]
[136/136] Installing opnsense-25.7.6...
[136/136] Extracting opnsense-25.7.6: .......... done

vs.

[77/77] Upgrading opnsense from 25.7 to 25.7.6...
[77/77] Extracting opnsense-25.7.6: .......... done

A bit of a difference and if this pans out easy to hotfix, but needs more QA obviously.

It's also much faster now. It actually has 77 new updates, in the first one it splits it into 136 steps with the long deinstalled gap in between. Having the core package at position 77/77 seems consistent with the old pkg 1.9 behaviour we've come to love over the years.


Cheers,
Franco