Upgrade to 25.1 fails to reboot and hangs

Started by xxup, February 13, 2025, 08:58:19 AM

Previous topic - Next topic
Started with a clean install on a test box (same spec as the production box) that started from 24.7 and was upgraded to 24.7.12-4.  I did this because I foolishly tested Zenarmor on the production box and I already knew that it would not upgrade cleanly.

In the background I see the following text:

***GOT REQUEST TO UPGRADE***
Currently running OPNsense 24.7.12_4 (amd64) at Thu Feb 13 15:52:01 AEST 2025
Fetching packages-25.1 amd64.tar: ... done
Fetching base-25.1-amd64.txz: .......................... done
Fetching kernel-25.1-amd64.txz: ........... done
Extracting packages-25.1-amd64.tar... done
Extracting base-25.1-amd64.txz... done
Extracting kernel-25.1-amd64.txz... done
Please reboot.
>>> Invoking upgrade script 'sanity.sh'
Passed all upgrade tests.
>>> Invoking upgrade script 'cleanup.sh'
!!!!!!!!!!!! ATTENTION !!!!!!!!!!!!!!! !
A critical upgrade is in progress. ! !
Please do not turn off the system. !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Installing kernel-25.1-amd64.txz...
done
***REBOOT***

In the foreground is a text box with the message:
Your device is rebooting
The upgrade has finished and your device is being rebooted at the moment, please wait...

After an hour nothing happens - so I force the shutdown by pressing the power switch on the box.

The box restarts, but there is no disk activity (there is a light on this box to show the activity. )
Wait 10 minutes and power down again.
Restart the box and this time there seems to be furious disk activity, but no dhcp service as the test lab linux box has no IP address yet.
So, I take the dog for a walk - nearly dark down here.  Bit cool too as it is only 28C outside.

Still no response.

Plugged in the console cable - no response.
Turned off the box again and this time the console came alive and now I can log in.
The dashboard reports that 25.1 is installed.
Can't find any update log.
The System -> Firmware -> log File reports, "17:47 Pkg-static opnsense upgraded: 24.7.12_4 -> 25.1", which is around the time the reboot finally worked!

Where can I find a more complete log of this nightmare, before I start to upgrade to 25.1.1 ?  Is this normal for a major upgrade?  I have just come over from pFsense.


Adrian from Down Under

This morning I updated to 25.1 and the firewall was not restarted. I logged in with ssh and restarted from menu "6) Reboot" option and then I saw a process that prevented the restart. I connected from another console by ssh and killed the process monit which was the process that prevented restarting. Finally, the firewall has been restarted and seems to working as expected.

The update log is printed along side the upgrade process e.g you posted it yourself. It is as well stored (if you store it on the DISK and not on the RAM) in the System > Logs > General.

However, you didn't had any problem with the upgrading itself as it can be seen from the logs you posted. The problem was that when there was a post reboot attempt your system froze.

What hardware do you have and are you using UFS or ZFS?

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

February 13, 2025, 10:17:08 AM #3 Last Edit: February 13, 2025, 10:19:24 AM by xxup
Using ZFS.

Using a Watchguard M500 with 8gb ram and an i3-4350T CPU

I also found this under system -> Firmware Status -> upgrade - attached as UpgradeLog.pdf - Which seems to say that it worked.

But then there is the health check that seems to state that I should be on 25.1.1.

First, I need to understand why there is a reboot problem. I did not experience this going from 24.7 -> 24.7.12_4 in either box.

The configuration is very simple - only Acme and the os-cpu-microcode-intel are installed.  I don't use monit on either box.  The firewall is for home use, but the Finance Director (FD) is pretty tough on any downtime affecting her streaming or farcebook viewing.   

So I need to find a DMESG log?


Adrian from Down Under

Also looking at the back end log file and I see a recurring cycle of these:

2025-02-13T11:31:01   Informational   configd.py   message df4e19e2-4230-49bc-bf48-f8f4dc40b222 [] returned b''   
2025-02-13T11:31:01   Notice   configd.py   [df4e19e2-4230-49bc-bf48-f8f4dc40b222] refresh url table aliases   
2025-02-13T11:31:01   Debug   configd.py   OPNsense/Filter generated //usr/local/etc/filter_geoip.conf   
2025-02-13T11:31:01   Debug   configd.py   OPNsense/Filter generated //usr/local/etc/filter_tables.conf   
2025-02-13T11:31:01   Notice   configd.py   generate template container OPNsense/Filter   
2025-02-13T11:31:01   Notice   configd.py   [a3cf844b-3ce8-4959-b999-9578d7f06fdf] generate template OPNsense/Filter   
2025-02-13T11:31:00   Notice   configd.py   [18e13b6f-f103-4231-8541-964f405d9487] list gateways   
2025-02-13T11:31:00   Notice   configd.py   [217568ca-8858-4447-a665-260e8a57d67d] request pf current overall table record count and table-entries limit

It Repeats the cycle every 15 minutes.
Is this normal?  It is also on the production box that is still running 24.7.12_4.

Adrian from Down Under

February 13, 2025, 11:12:52 AM #5 Last Edit: February 13, 2025, 11:52:45 AM by xxup
Lightbulb moment?

2025-02-13T20:01:01   Informational   configd.py   message 663c2594-65e4-4c3e-b9e0-55d26dd50327 [] returned b''   
2025-02-13T20:01:01   Notice   configd.py   [663c2594-65e4-4c3e-b9e0-55d26dd50327] refresh url table aliases

Something to do with url table alias being refreshed every 15 minutes?

I struggle to see how an alias table refresh would stop a reboot.
Adrian from Down Under

February 13, 2025, 12:08:57 PM #6 Last Edit: February 13, 2025, 12:42:18 PM by xxup
Just installed 25.1.1 on the box and the same hang at reboot has occurred. No access at console.

Except that I only needed to switch off the box once, and then the console worked.

*** Update ***
Now it hangs whenever I do power -> reboot or use option 6 from the console.
However, power -> power off works perfectly.

When I watch on the console, it looks like the start of shutdown

Syncing disks, vnodes remaining... 0 Waiting (max 60 seconds) for system process 'syncer' to stop... 0 0 0 0 done
All buffers synced.
Uptime: 2m29s
uhub3: detached
uhub0: detached
uhub4: detached
uhub1: detached
uhub2: detached

> Then nothing.

Even with a factory reset, and the defaults retained (i.e. 192.168.1.1 on port 0) the box will not reboot.
*******

I am going to revert the box to 24.7 and start again until I work out the cause of the hang. 
Adrian from Down Under

February 15, 2025, 05:42:39 AM #7 Last Edit: February 15, 2025, 07:11:07 AM by xxup
It seems that the i3-4350T CPU will not reboot on the M500 (Lanner board).  I changed this to an i3-4130 CPU ( I had on hand from another project) and everything is now working correctly.

I am not sure why I thought that it rebooted on 4.7, because after I reinstalled 4.7 from the flash drive and later upgraded to 4.7.12._4. The M500 would not reboot.

I have now copied the configuration from the production box to the M500 test box and the upgrade is looking good.  Just the post-install checklist to do now.

Anyway, the big message is don't bother trying the i3-4350T on a M500 - it won't reboot.
Adrian from Down Under