Failed upgrade 23.1.11_2 to 23.7 resulting in can't load 'kernel'

Started by mfalkvidd, Today at 01:43:09 PM

Previous topic - Next topic
I'm trying to catch up with the latest release. Two weeks ago I upgraded from v22 to v23. Had to dig out a serial cable so I could (re)configure the interfaces over serial, but except that the upgrade was fine.

Today I initiated the update from 23.1.11_2 to 23.7 through the web UI. It indicated that base, kernel and packages needed update.
The web UI stalled at the kernel update.
An hour later I could still access internet from my devices (through Opnsense). Opnsense still responded to ping. But ssh hanged (no connection timeout, no connection refused, just hanging indefinitely). Same with web interface.
Serial console gave me login prompt, but after providing the password nothing happened.

I pulled the power, waited a bit and powered on again. Got this:

PC Engines apu4
coreboot build 20212402
BIOS version v4.13.0.4
4080 MB ECC DRAM

SeaBIOS (version rel-1.12.1.3-0-g300e8b70)

Press F10 key now for boot menu

Booting from Hard Disk...



            /  __  |/ ___ |/ __  |
            | |  | | |__/ | |  | |___  ___ _ __  ___  ___
            | |  | |  ___/| |  | / __|/ _ \ '_ \/ __|/ _ \
            | |__| | |    | |  | \__ \  __/ | | \__ \  __/
            |_____/|_|    |_| /__|___/\___|_| |_|___/\___|

 +-----------------------------------------+     @@@@@@@@@@@@@@@@@@@@@@@@@@@@
 |                                         |   @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 |  1. Boot Multi user [Enter]             |   @@@@@                    @@@@@
 |  2. Boot Single user                    |       @@@@@            @@@@@
 |  3. Escape to loader prompt             |    @@@@@@@@@@@       @@@@@@@@@@@
 |  4. Reboot                              |         \\\\\         /////
 |  5. Cons: Serial                        |   ))))))))))))       (((((((((((
 |                                         |         /////         \\\\\
 |  Options:                               |    @@@@@@@@@@@       @@@@@@@@@@@
 |  6. Kernel: default/kernel (1 of 2)     |       @@@@@            @@@@@
 |  7. Boot Options                        |   @@@@@                    @@@@@
 |                                         |   @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 |                                         |   @@@@@@@@@@@@@@@@@@@@@@@@@@@@
 +-----------------------------------------+
   Autoboot in 0 seconds. [Space] to pause     23.1 ``Quintessential Quail'' |

Loading kernel...
Failed to load kernel 'kernel'
can't load 'kernel'

can't load 'kernel'

I proceeded with boot kernel.old which was successful (except that it was still on 23.1 of course). This error was noted during boot though:
swapon: adding /dev/ada0p3 as swap device
.ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg /usr/local/lib/ipsec /usr/local/lib/perl5/5.32/mach/CORE
32-bit compatibility ldconfig path:
done.
>>> Invoking early script 'upgrade'
!!!!!!!!!!!! ATTENTION !!!!!!!!!!!!!!!
! A critical upgrade is in progress. !
! Please do not turn off the system. !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Version number mismatch, aborting.
    Kernel: 13.1
    Base:   13.2
>>> Invoking early script 'configd'
Starting configd.

There is 3.3GB available space on zroot. I did a scrub two weeks ago which resulted in 0B repaired and 0 errors.

Any ideas on how to proceed from here?

IMO you just need to retry the upgrade since nothing was done. The kernel should apply as long as the disk is dependable. If not it's time for a reinstall anyway (and investigate replacing the disk).


Cheers,
Franco

Thanks. Will do, but through cli instead of web so I can see what is happening.

How can I check the disk, besides scrubbing the pool?

The health audit in the firmware status is a first good preliminary check for consistency. I'm not an expert on ZFS disk health monitoring, but there should be ample information here in the forum about it from Patrick et al.


Cheers,
Franco

If a scrub returns no error all data and metadata that is actually on the disk is guaranteed ok.

For possible device errors, end of lifetime notifications, etc. check out Scrutiny:

https://forum.opnsense.org/index.php?topic=48101.0


Honestly I am puzzled nobody ever commented on my HOWTO or came back with questions. Disk monitoring, like temperature and fans (if present) is essential, IMHO.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)