[Solved]Upgrade from 23.1.11_1 to 23.7 failing

Started by ajeffco, August 02, 2023, 07:53:29 PM

Previous topic - Next topic
August 02, 2023, 07:53:29 PM Last Edit: August 03, 2023, 12:29:31 AM by ajeffco
Hello,

Have an odd issue I haven't seen any postings about.

I have OPNsense installed on Proxmox with 2 identically configured (except IP configs) OPNsense vm's running on 2 different PVE nodes, not clustered , in a carp configuration.  I upgraded OPNsense vm #2 this morning without any issue from 23.1.11 -> 23.1.11_1 -> 23.7.

The same upgrade path on the OPNsense vm #1 is not working.  It successfully updated from 23.1.11 -> 23.1.11_1.  When I try to upgrade via GUI or CLI to 23.7, it downloads and installs items, then says it's performing an upgrade (screenshot #1).  However instead of rebooting, it just shuts down entirely.  When I start the now shutdown VM, it boots into 23.1.11_1 like nothing ever happened (screenshot #2).

Not even sure where to start looking for logs on what might be causing this.  Any help would be appreciated.

Thanks.
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

It's a fairly clean output there, not much stands out.

Did you try disabling Xymon pre-upgrade ? You can reenable it after.

I hadn't thought of it but can try it.  The other OPNsense VM has xymon running as well, no trouble on that node at all, it was flawless.

From the megathread I found after posting my issue, I tried the following:

opnsense-update -u
opnsense-shell reboot


It looks like pretty much the same as the GUI and CLI Menu attempts.  Downloads some stuff then says its rebooting, shuts down and on manual start comes up in 23.11.1_1.

I'll try disabling the xymon client and see if that makes a difference.

Something I do notice now.  I was trying to collect the audit logs that showed the update status.  They are now gone, and in the updates tab under System:Firmware, it looks like the system thinks it's on a different version than 23.1.11_1.  Screenshot attached.
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

August 02, 2023, 08:17:47 PM #3 Last Edit: August 02, 2023, 08:38:53 PM by ajeffco
I tried something.  Until now reboots worked, the system when it wants to reboot is now shutting down.  I tried to just reboot the VM (Power -> Reboot), and the VM halts instead of rebooting.  Haven't seen this before today, but I'm going to go through all the settings on the VM Configs and see if there's something different there.  When I built these quite some time ago, they were built in parallel at the same time, all settings were identical.

EDIT:  Interestingly, in watching the console and rebooting from the Console Menu, the node actually reboots, comes up to a login herald and then powers off.
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

I removed the xymon-client package using 'pkg remove xymon-client' and updated, same results.  And reboot is still shutting down.

Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

Clone the upgraded VM or reinstall and import the config ?

In any case, this doesn't seem to be an OPNsense/BSD issue.

Was thinking to reinstall and restore the config and see how that goes.  I'm restoring that VM to a previous backup from a week ago to see if it was doing this then.  If that method doesn't work I'll reinstall and restore config.

Thanks!

Al
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

@newsense, thanks for the help.  I installed OPNSense 23.7 using the existing VM, I did not create a new VM.  Reinstalled, restored settings and it's back up and working, CARP/HA works, fails back and forth as expected.  It now reboots using all 3 methods.  The only thing that didn't come back was the installed packages, which is expected and easy to remediate.

The old VM load was rebooting but after the reboot when it got to a login herald it would crash (halt at the herald with no shutdown).  I caught the reboot and the startup, unfortunately Snagit overwrote the important parts with the console going offline with the crash.

As it's working on the exact same VM, not sure how it could be anything other than a FreeBSD/OPNsense issue.  It very well could be something I did loading the xymon client 6 days ago, I know for sure that VM was rebooting up to that point (last step in the steps below).

Here's the steps I used to load that client, maybe someone can say if I used a wrong step or not here.

##### For OPNSense running FreeBSD 13
pkg add https://pkg.freebsd.org/FreeBSD:13:amd64/latest/All/xymon-client-4.3.30.pkg
vi /usr/local/www/xymon/client/runclient.sh - Change MACHINEDOTS
vi /usr/local/www/xymon/client/etc/xymonclient.cfg - Change XYMSRV

vi /usr/local/etc/rc.syshook.d/start/99-xymon-client
Add below lines between cut and /cut
---cut---
#!/bin/sh
/usr/local/www/xymon/client/runclient.sh start
---/cut---
Save
chmod +x /usr/local/etc/rc.syshook.d/start/99-xymon-client

cp /usr/local/etc/rc.syshook.d/start/99-xymon-client /usr/local/etc/rc.syshook.d/stop/99-xymon-client
Edit /usr/local/etc/rc.syshook.d/stop/99-xymon-client and change start to stop

Test with:
/usr/local/etc/rc.syshook.d/stop/99-xymon-client
/usr/local/etc/rc.syshook.d/start/99-xymon-client

Reboot to validate autostart

Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

Quote from: ajeffco on August 03, 2023, 12:28:52 AM
...  The only thing that didn't come back was the installed packages, which is expected and easy to remediate.
...

Actually if you check for updates on the console after importing the config all plugins will be reinstalled. Works in the web page too, but I don't remember if checking for updates will suffice or you need to check one of the buttons in System-Firmware-Status ---probably the "Run the automatic resolver" one when it becomes available

It actually worked out that all of the plugins weren't reinstalled.  Out of 6, I only cared to reinstall 3.  The other three I no longer use or need.  The other 3 are flagging red, but I can't figure out how to remove them in the plugins page.  Post in a new thread in General about it.
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)