Hi,
We install OPNSense on an Ubuntu KVM Virtual Machine. It's been running fine and even a vanilla install of 24.1 image is fine.
The problem is the upgrade to the subsequent minor releases (24.1.4, 24.1.5, 24.1.6, ...).
We get the following error when installing:
[81/81] Upgrading opnsense from 24.1 to 24.1.4...
[81/81] Extracting opnsense-24.1.4: .......... done
Stopping configd...done
Resetting root shell
Updating /etc/shells
Unhooking from /etc/rc
Unhooking from /etc/rc.shutdown
Updating /etc/shells
Registering root shell
Hooking into /etc/rc
Hooking into /etc/rc.shutdown
Starting configd.
>>> Invoking update script 'refresh'
Migrated OPNsense\Wireguard\Client from 0.0.7 to 1.0.0
Migrated OPNsense\Kea\KeaDhcpv4 from 0.0.1 to 1.0.0
Migrated OPNsense\Syslog\Syslog from 1.0.1 to 1.0.2
Migrated OPNsense\Unbound\Unbound from 1.0.8 to 1.0.9
Writing firmware settings:Illegal instruction (core dumped)
We've tried it again with 24.1.5 and 24.1.6 and got the same results.
We tried it without our custom configs and it still crashes.
Version 24.1 runs fine, so the hardware should not be the issue?
Is there something in the recent minor updates that breaks the firmware?
Check the CPU type of the KVM virtual machine. I am not experienced enough with KVM to tell you the "correct" setting but I know that it matters.
We tried a vanilla upgrade to 24.1.8 this morning and still ran into the same core dump bug.
We checked the CPU specs of our servers:
This one upgrades OK:
1. Intel Celeron N3450: released September 2016
These ones fail upgrading:
2. Intel Core i7-9700: released April 2019
3. Intel Core i7-10710U: released August 2019
We build our KVMs with the following CPU flags:
--cpu host,-xsave
So, it inherits the CPU of the host system.
In the OPNSense Dashboard, it show the following CPU TYPE for the failing cases:
CPU type Intel Core Processor (Skylake, IBRS) (2 cores, 2 threads)
I don't understand how the low-end CPU is doing fine, but the higher-end ones are having problems.
Sorry, my notes had a typo. The working CPU is Intel Celeron J3455 (released August 2016).
Since you're doing a passthrough install the latest cpu microcode and try again
https://forum.opnsense.org/index.php?topic=36139.msg177362#msg177362 (https://forum.opnsense.org/index.php?topic=36139.msg177362#msg177362)
pkg install cpu-microcode
no, you can't do that. Is not a passthrough of the hardware that can then be updated the microcode.
Updates need to be done on the host, if they wanted to do it.
Ubuntu normally ships frequent microcode updates though.
Although we have been unable to update OPNSense, we have been updating the latest Operating System updates for Ubuntu. So we are staying on top of the updates including any microcode ones.
The CPU type that is working is the following:
CPU type Intel Atom Processor (SnowRidge) (2 cores, 2 threads)
Upgrade to 24.1.8 failed after migrating DHCRelay, which is one step after Unbound, where it was failing before:
[173/173] Extracting opnsense-24.1.8: .......... done
Stopping configd...done
Resetting root shell
Updating /etc/shells
Unhooking from /etc/rc
Unhooking from /etc/rc.shutdown
Updating /etc/shells
Registering root shell
Hooking into /etc/rc
Hooking into /etc/rc.shutdown
Starting configd.
>>> Invoking update script 'refresh'
Migrated OPNsense\Wireguard\Server from 0.0.4 to 1.0.0
Migrated OPNsense\Wireguard\Client from 0.0.7 to 1.0.0
Migrated OPNsense\Kea\KeaDhcpv4 from 0.0.1 to 1.0.0
Migrated OPNsense\Syslog\Syslog from 1.0.1 to 1.0.2
Migrated OPNsense\Unbound\Unbound from 1.0.8 to 1.0.9
Migrated OPNsense\DHCRelay\DHCRelay from 0.0.0 to 1.0.1
Writing firmware settings:Illegal instruction (core dumped)
We tested the latest release 24.1.9 and it no longer shows the message "Illegal instruction (core dumped)" and simply shows "Writing firmware settings:". Upon reboot, it still core dumps (see attached screenshot).
We found the root cause of the issue: The -xsave parameter for the CPU was tripping up the upgrade.
The -xsave flag was originally put in place to prevent a kernel panic on an earlier version of Ubuntu hypervisor. This flag is no longer needed and is instead breaking the OPNSense update. Once we removed the flag, the upgrade goes through. We are on OPNSense 24.7.4 with no issues.