pkg update crashes and bricks firewall

Started by chindokae, September 08, 2025, 06:32:00 PM

Previous topic - Next topic

September 08, 2025, 11:04:32 PM #16 Last Edit: September 08, 2025, 11:06:28 PM by chindokae
Well, maybe microcode and mitigations aren't the problem after all:

dmesg | grep micro
[1] CPU microcode: updated from 0xe to 0x1d

sysctl hw.ibrs_disable vm.pmap.pti
hw.ibrs_disable: 0
vm.pmap.pti: 1

The sqlite database is not corrupt physically this time, semantically, IDK.

***GOT REQUEST TO CHECK FOR UPDATES***
Currently running OPNsense 25.7 (amd64) at Mon Sep  8 21:00:10 UTC 2025
Fetching changelog information, please wait... done
Updating OPNsense repository catalogue...
Child process pid=11217 terminated abnormally: Segmentation fault
Child process pid=11850 terminated abnormally: Segmentation fault
Child process pid=13706 terminated abnormally: Segmentation fault
self: No packages available to install matching 'opnsense'
***DONE***

I cleaned out the pkg data and caches and got it bootstrapped again.   I can install packages and search for things:

pkg search -Q name opnsense
opnsense-25.7.2
Name           : opnsense
Comment        : OPNsense community release

So from the command line it seems OK, but from the UI, touching the "check for updates" freezes the web gui.   That requires areboot from the shell.   

Is there an updated image available to install 25.7.2?   

Quote from: chindokae on September 08, 2025, 08:04:41 PMPlease explain how the CPU can cause layer 7 problems. 
Well, from the freshPorts site (https://www.freshports.org/sysutils/cpu-microcode-intel) I track down intel-ucode releases for N100 and I only find one functional issue for N100 (below), so I guess a cpu issue of such can cause all sorts of weirdness.

It's also the only functional issue I could find that is address in applying ucode. I however don't expect this issue to be present at every boot, because it's noted "Under complex microarchitectural conditions".

QuoteCPU May Not Load The Most Recent Data
Under complex microarchitectural conditions, a read on one logical processor may
not receive the most recently stored data by another logical processor

Due to this erratum, unpredictable system behavior or a system hang may occur.
Intel has only observed this behavior in a synthetic test environment. Intel has not
observed this erratum with any commercially available system.

Reference: https://cdrdv2.intel.com/v1/dl/getContent/764616
Mini-pc N150 i226-V, GOD BLESS CHARLIE KIRK

September 08, 2025, 11:55:02 PM #19 Last Edit: September 09, 2025, 12:41:36 AM by BrandyWine
Quote from: chindokae on September 08, 2025, 11:41:24 PMo from the command line it seems OK, but from the UI, touching the "check for updates" freezes the web gui.
How about run the audit from the Gui.

0x1d does appear to be latest for N100/N200 series (06-be-00 ucode).
Reference:
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20250512
ucode https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/refs/heads/main/intel-ucode/06-be-00

Beyond that May release I don't find any ucode that has N100 in it.
Mini-pc N150 i226-V, GOD BLESS CHARLIE KIRK

September 09, 2025, 01:22:29 AM #20 Last Edit: September 09, 2025, 02:21:20 AM by chindokae
 Te recap - I am running the latest microcode - 0x1d and I applied the recommended sysctl settings at boot.

I have cleaned out the corrupt files in the /var/db/pkg and /var/cache/pkg directories and bootrapped  pkg again.

I had to reinstall it (pkg) for no apparent reason, since I didn't remove it, but it went in cleanly.

I was able to install smartmontools and os-cpu-microcode-intel.

I downloaded images today that are labeled 25.7 and appear to be running that, but the updated versions I see in the repo are not available for download.

pkg says I am up to date even though I see what appear to be two new revisions, 25.7.2 and 25.7.1.

I can only check for updates from the command line because letting the GUI do that cause it to lock up.

All the segmentation faults, etc, are now fixed from the cli.

The repo is not approve reproach - the smartmontools package did not install via pkg install because it tried to whack and entire /man folder.  It tried to move /usr/local/share/man/.pkgtemp.man8.W8KvzBQqO5Xf over usr/local/share/man/man8 which contains a lot of manpage files.  It wasn't trying to copy into it, it tried to replace it and was thankfully blocked.   I installed the package out of the cache, but that is hardly a smooth install.

That is all the time I can afford to put into this problem today.  I was *supposed* to be working on hardening the QNAP when this happened.  New RAM is on the porch and I need to put that in and leave well enough alone for on this problem.

Edit: RAM went in without a fight, Gott sei dank.

To finish off the day's status and for sake of completeness:

I ran long and short smartctl tests and they both completed without error.  dmesg does not show any indication of physical corruption of the disk, no read or write faults, no bad block error, etc.

The initial crash happened during an attempt to update via the GUI and that still isn't right.  Even if it doesn't crash the system, it breaks the httpd service and that requires a reboot, but that is not as bad as the beeping black screen of death.

I have mitigated the presumed firmware issue but have not achieved full operation.  I still see no indication of any hardware faults.   The runaway update error may just have recursed away all available RAM and not crashed the stack.  Either way, it's going down.

Given the inconsistencies in patching even from the command line I have to wonder if the repos got a second bad update.   It happens.

Quote from: chindokae on September 09, 2025, 01:22:29 AMTo recap
How about run the audit from the Gui.
Mini-pc N150 i226-V, GOD BLESS CHARLIE KIRK

Quote from: BrandyWine on September 09, 2025, 04:11:10 AM
Quote from: chindokae on September 09, 2025, 01:22:29 AMTo recap
How about run the audit from the Gui.


If you mean the one on the update page:

Version 25.7 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 25.7 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check installed repositories
OPNsense (Priority: 11)
>>> Check installed plugins
os-cpu-microcode-intel 1.1
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: ........ done
>>> Check for missing or altered package files
Checking all packages: ........ done
>>> Check for core packages consistency
Core package "opnsense" not known to package database.

The the packagesite package has been upgraded a few times.  It reports 898 packages and no errors, but even though it appears 27.1.3 is out as of today, it couldn't be installed - until one last update tonight and not it is downloading 171 packages.

September 10, 2025, 12:57:21 AM #23 Last Edit: September 10, 2025, 01:33:15 AM by BrandyWine
Quote from: chindokae on September 10, 2025, 12:08:11 AM>>> Check for core packages consistency
Core package "opnsense" not known to package database.

I think this issue has been spoken to before by OPNsense folks here on the forum. Need to fix this item.
I think the fix was to install it from cli.

Maybe start with
pkg check -d
pkg clean -an (check and see what dry run says)
pkg clean -a (cleans the cache out)
pkg install -f opnsense
reboot

Then re-run the audit. If it's all good then use GUI update to see what it does.

Also use search feature, usually comes back with something good.
https://forum.opnsense.org/index.php?topic=48599.msg245505#msg245505
Mini-pc N150 i226-V, GOD BLESS CHARLIE KIRK

Quote from: BrandyWine on September 10, 2025, 12:57:21 AM
Quote from: chindokae on September 10, 2025, 12:08:11 AM>>> Check for core packages consistency
Core package "opnsense" not known to package database.

I think this issue has been spoken to before by OPNsense folks here on the forum. Need to fix this item.
I think the fix was to install it from cli.

Maybe start with
pkg check -d
pkg clean -an (check and see what dry run says)
pkg clean -a (cleans the cache out)
pkg install -f opnsense
reboot

Then re-run the audit. If it's all good then use GUI update to see what it does.

Also use search feature, usually comes back with something good.
https://forum.opnsense.org/index.php?topic=48599.msg245505#msg245505


None of those things had any effect yesterday and I tend let Chat do my searching these days, although I did start here with a search.   As always, knowing the root cause of the problem and searching for that initially always seems to work a lot better than having to work from the initial presentation of the problem to its eventual resolution.  Makes you look smarter, too. 

The resolution to this was purging the sqlite database files - which failed many times yesterday - then trying again today after 25.7.3 was released, reinitializing the local database with packagesite, then working through the sequence of update steps that eventually resolved the issue and got it patching via the GUI again.

This problem started with patching and ended with it.  No amount of trying to update while 27.5.2 was the latest release on the repos worked, but as soon as 27.5.3 showed up, the recommended recovery techniques worked and I could patch from the console, the boot menu, or the GUI.  I copied the steps out of history, and today, they worked.

It is nice to see that the cpu-microcode-intel package now deals with the firmware issue.  Now this product works like the other major Unix distros.

I think I'll go with Occam's Razor on this one and say the cause was trying to update to 25.7.2.




Quote from: chindokae on September 10, 2025, 04:33:25 AMNone of those things had any effect yesterday
You ran all those commands yesterday? Did I miss where you posted that?
So with that pkg error your device is good? If so then just leave it alone.
Mini-pc N150 i226-V, GOD BLESS CHARLIE KIRK

September 10, 2025, 07:39:21 AM #26 Last Edit: September 10, 2025, 08:08:25 AM by franco
> It is nice to see that the cpu-microcode-intel package now deals with the firmware issue.  Now this product works like the other major Unix distros.

It's strange really: at some point in time nothing was needed for N100, then came a operating system update and the thing was unstable. Then came microcode updates and it was stable again. Now changing two sysctls in 25.7 made it operate less optimal again. A third sysctl was discovered. At some point the microcode updates were not working and now they work again.

Meanwhile hundreds of thousands of users had no apparent weirdness on their installs. Makes perfect sense?

And please listen: I'm not saying there's no issue. I'm saying there is definitely an issue and it's likely beyond our control.


Cheers,
Franco

Quote from: Patrick M. Hausen on September 08, 2025, 10:48:49 PMInstall the os-cpu-microcode-intel plugin and reboot.

I'm in the process of upgrading my OPNsense box to and odroid H3+.  I was having all kinds of SATA errors.  It turns out the latest H3+ bios version of the microcode is 0x10, while this plugin injects version 0x1d.  It seems this microcode update fixes my SATA issue.  I also set the tunables:

hw.pci.enable_aspm=0
vm.pmap.pcid_enabled=0

When installing the plugin, it indicated it was going be removed from a later OPNsense release after June 2025. I did ask odroid support for an upgrade bios with the latest intel microcode.  We'll see how they respond. 

So my question is, are there plans to remove the os-cpu-microcode-intel?  I'm hoping the response is that removal is on hold, with no new date in sight.


September 11, 2025, 12:03:57 AM #28 Last Edit: September 11, 2025, 12:20:23 PM by Patrick M. Hausen
The FreeBSD package messages when installing updates or plugins through the UI are completely irrelevant in the OPNsense context and only shown for diagnostic purposes.

As is clearly stated in the bottom line of that very dialog box. You can safely ignore everything that ever appears in that window.

The message comes from an auxiliary package that will be going away, but the microcode updates won't.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)