opnsense-revert opnsense fails

Started by Greelan, March 23, 2026, 04:14:10 AM

Previous topic - Next topic
Weird problem. I've been experimenting with some patches, and have routinely successfully reverted them by "opnsense-revert opnsense".

But on the latest try, it is stuck at ">>> Invoking update script 'refresh.sh'"

Previously I interrupted the script at this point and then tried again, but it complained that the OPNsense package repo conf was not available. So I copied that back from the sample, but the script still gets stuck.

Any tips how to fix this?

I've managed to deal with that - that wasn't fun.

It appears that when I had to abort opnsense-revert because it hung, the FreeBSD repo was somehow enabled. Then when I ran opnsense-revert again, pkg was updated to the FreeBSD version (2.5.1), which obviously led to some conflicts.

I had to go through a process of disabling the repo, deleting the pkg database, and force reinstalling OPNsense. Then checking there weren't any rogue packages on the system from the FreeBSD repo.

Ugh.

FWIW, it seems that the original "hung" state of opnsense-revert was caused by a file lock on config.xml by configd, which prevented run_migrations.php from acquiring it

If you had FreeBSD repo enabled you have bigger fish to fry because it upgraded pkg to an incompatible version.


Cheers,
Franco

March 24, 2026, 11:41:15 AM #4 Last Edit: March 24, 2026, 12:08:44 PM by Greelan
Yes, as I said. It was not me who enabled it though. This only occurred when I had to abort the opnsense-revert script. Somehow that caused the OPNsense repo to be disabled and the FreeBSD repo to be enabled

Today at 01:19:01 AM #5 Last Edit: Today at 02:30:13 AM by Greelan
@franco, perhaps a trap could help in the scenario that I faced? At the very least the user would be returned to a stock repo configuration, rather than a broken one.

--- a/src/revert/opnsense-revert.sh
+++ b/src/revert/opnsense-revert.sh
@@ -30,6 +30,24 @@ if [ "$(id -u)" != "0" ]; then
  exit 1
 fi
 
+REPOSDIR="/usr/local/etc/pkg/repos"
+
+recovery()
+{
+ if [ -f "${WORKPREFIX}/.recovery" ]; then
+ # post-install scripts may not have completed;
+ # attempt full reconfiguration first, fall back
+ # to restoring repo defaults if config.xml is
+ # not accessible (e.g. held by another process)
+ if ! timeout 30 /usr/local/etc/rc.configure_firmware; then
+ for CONF in $(find ${REPOSDIR} -name '*.conf.sample'); do
+ cp ${CONF} ${CONF%.sample}
+ done
+ fi
+ rm -f "${WORKPREFIX}/.recovery"
+ fi
+}
+
+trap recovery EXIT
+
 WORKPREFIX="/tmp/opnsense-revert"
 WORKDIR=${WORKPREFIX}/${$}
 PKG="pkg-static"
@@ -93,6 +111,8 @@ done
 for PACKAGE in ${@}; do
  # reset automatic, vital as per package metadata
  AUTOMATIC="1"
+
+ touch "${WORKPREFIX}/.recovery"
 
  if [ -n "${COREPKG}" -a "$(echo "${COREDEP}" | grep -c ${PACKAGE})" != "0" ]; then

There's already some worst case recoveries in there, but covering all cases will be difficult to maintain assuming that we desperately need them.

Movement was made in several other directions:

1. The check.sh script can now detect if the wrong version number is installed. It doesn't enforce the right pkg version yet but it could be done.... unless a pkg update breaks the database backwards compat which screws the user over anyway.

2. The nasty pkg-upgrade bug was found. It was a background clean script we had run ourselves for the right reasons, but it also actively sabotages pkg-upgrade still doing its job.

3. We forcefully disable FreeBSD repo since a few years on firmware configure.  We can't avoid user console fiddling with that neither, but at least preserve the integrity the system still has after the fact.

At the moment most people fear the benign unepected error popup most and that's saying something for stability. We should address this next (and also update pkg to a newer version obviously but we're waiting for 26.4 to come out first).


Cheers,
Franco


Quote from: franco on Today at 10:11:40 AM3. We forcefully disable FreeBSD repo since a few years on firmware configure.  We can't avoid user console fiddling with that neither, but at least preserve the integrity the system still has after the fact.
I guess what I am trying to do is have a failsafe where the script aborts in between /usr/local/etc/pkg/repos/FreeBSD.conf being removed during the script's process (so the disable override is gone), and refresh.sh re-copying back the sample confs. Without that failsafe, the disable override isn't there when opnsense-revert or an update is run again, and so the system falls back to using /etc/pkg/FreeBSD.conf. This is what happened in my case. When I re-ran opnsense-revert after the first abort, I received an error about package conflicts (can't recall the exact message), and was prompted to reinstall/upgrade pkg. Not realising that this was because the FreeBSD repo was being used, I did that, and this led to my troubles.

The point being: re-enabling of the FreeBSD repo was not caused by "user console fiddling".

Are you talking about reapplying firmware settings?

# git grep system_firmware_configure
src/etc/inc/plugins.inc.d/core.inc:        'firmware_reload' => ['system_firmware_configure'],
src/etc/inc/system.inc:function system_firmware_configure($verbose = false)
src/etc/rc.bootup:system_firmware_configure(true);
src/etc/rc.configure_firmware:system_firmware_configure(true);
src/etc/rc.reload_all:system_firmware_configure(true);

We have a few spots already... yet... installing a package will render both OPNsense.conf and FreeBSD.conf so where do the files get lost except when the core package is gone (which is vital so pkg isn't supposed to remove it).

The refresh is only called after successful package install after all, too.


Cheers,
Franco

Today at 11:37:06 AM #9 Last Edit: Today at 11:40:20 AM by Greelan
Quote from: franco on Today at 10:57:01 AMWe have a few spots already... yet... installing a package will render both OPNsense.conf and FreeBSD.conf so where do the files get lost except when the core package is gone (which is vital so pkg isn't supposed to remove it).

Here? https://github.com/opnsense/core/blob/master/Keywords/shadow.ucl#L47

(which picks up https://github.com/opnsense/core/blob/5a5350e29ead5cea9c3160d618833552ecde9f4d/plist#L2567-L2569)

My understanding of the logic is that the script goes through pre-deinstall then post-install. Maybe the solution is to move the sample confs copying to earlier in the post-install chain.

Yes, these. refresh.sh is supposed to be in post-install too:

https://github.com/opnsense/core/blob/master/%2BPOST_INSTALL#L36

Are you saying the order in which post-install is executed is wrong?

I wouldn't be surprised but I've never seen (it matter) either.


Cheers,
Franco

"Wrong" would be an overstatement. The issue is expressed better as the more that happens before the sample confs are reinstated - and refresh.sh does quite a bit - the greater the chance that things can go wrong before they are reinstated.

That's what happened in my case. refresh.sh is run pretty early in the post-install chain, and in my case is where it hung (as I said, I understand due to a file lock on config.xml likely causing reconfigure to stall). So it never got to reinstating the sample confs before I was forced to abort.

I think you're asking for something like "opnsense-update -sd".

We're bailing on opnsense-update when the repo origin isn't there:

https://github.com/opnsense/update/blob/56cc3d33225e8f797693e1833bd03b251082cd8d/src/update/opnsense-update.sh.in#L490-L493

That check isn't in opnsense-revert.  But now if we let other scripts know there is an origin file somewhere we basically start hardcoding it.

We could make opnsense-update a bit more helpful by adding a check if the origin is there (and script minimal recovery with it)?



Cheers,
Franco