I just installed a fresh copy of 27.1 on my hardware firewall. The installation was normal, configured LAN IP and DHCP scope, set root password, logged in with laptop on the LAN interface. Went to the update firmware page and allowed it to patch. It crashed and got stuck on the bug reporting page and would not leave it. I submitted it twice then forced a reboot with a short press on the power switch which usually does a graceful shutdown. Not this time, it gave three short beeps and powered off in less than 3 seconds. When I powered it back up it never appeared on the network and pushing the power button gave the same immediate power off.
Rebuilt it on the workbench but this time updated from the console. The core system was fully up to date but packages needed updating. I tried updating from the console and it started normally, downloaded the packages, then crashed immediately when it tried to apply them. The screen was flooded with thousands of errors, far too fast to read, then it rebooted itself.
Unlike the attempt from the web GUI, it did not brick itself. The first package was lighttpd, I believe. I am not in the mood to retry the update to find out.
This should be easily reproducible as no customizations or settings other than LAN IP and LAN DHCP scope were applied.
Most probably a hardware problem. Shoot a video of the console messages with your smartphone.
Well, I should add that this hardware has been running OpnSense for nearly a year and was already at 27.1 when I made what should have been minor change to Unbound DNS, setting it to use secure DNS at Cloudflare. That caused the loss of all host DNS resolution capability and when I disabled that service and went back to DHCP-based DNS, the GUI locked up and the firewall was essentially bricked. It would try to reboot if I used the power button but would not bring up the LAN interface.
I sent two bug reports under the email address associated with this account, all necessary information should be there. As I indicated, I am not willing to to take down the network again, I need it for work. From a practical point of view, no camera I have at the house has the frame rate necessary to capture the errors anyway. They were just a blur.
The core system firmware boots and runs fine as long as I don't try to update packages. I am running through the firewall now and using the web GUI as I write this.
To me that doesn't look anything like hardware. That looks like bad package.
You do you. Still looks more like a failing storage device than anything else to me. The packages of OPNsense are fine.
Are you perchance running an Intel N100 or similar CPU?
sysctl -a | grep hw.model
hw.model: Intel(R) N100
hw.clockrate: 806
hw.ncpu: 4
Intel Alder Lake-M
Please explain how the CPU can cause layer 7 problems.
I may not have been clear, but there were no hardware problems last week, last month, or last quarter, and nothing has changed. I update monthly when I update all the Linux machines and the offline repos.
I see nothing in dmesg or any other log to indicate a hardware issue, and there were no errors in /var/log/installer.log. All in all, the system is operating nominally, as long as I don't try to update packages.
There are errors in the logs like:
/system_advanced_admin.php: The command '/usr/sbin/chown -R dhcpd:dhcpd '/var/dhcpd'' returned exit code '1', the output was 'chown: dhcpd: illegal group name'
Which is true, there is no group named dhcp and the user name is _dhcp
Other than that the logs are fairly clean - and nothing that is going to crash a BSD kernel - which in my experience is nearly impossible.
The update program is running entirely in user space, and if it is able to crash the kernel then there is a very good chance that whatever caused this is exploitable.
I sent two complete error reports. If that doesn't suffice, I could try running it in a VM to see how that goes.
We have a ton of reports of N100 without current microcode updates in combination with UFS trashing on-disk data.
Consider re-installing with ZFS. Then after upgrade to the latest version add the microcode plugin.
Thanks, that is a viable explanation.
However, I did try ZFS first and that was an epic install failure. It may not like that FS any better than the other.
I will have to push whatever action I take to the right a bit, primary user one is using the network and it had better not go down again or there will be something bad for dinner.
I will try to install Linux and see if I can get that to push the firmware.
Quote from: chindokae on September 08, 2025, 08:04:41 PM/system_advanced_admin.php: The command '/usr/sbin/chown -R dhcpd:dhcpd '/var/dhcpd'' returned exit code '1', the output was 'chown: dhcpd: illegal group name'
Which is true, there is no group named dhcp and the user name is _dhcp
The system barked on 'chown:
dhcpd: illegal group name', so where do you come up with "
there is no group named dhcp"
Will your N100 boot to single user mode?
Quote from: chindokae on September 08, 2025, 08:52:44 PMI will try to install Linux and see if I can get that to push the firmware.
Install? No need to install, just run any lightweight liveLinux from boot USB, then look from there. I think Patrick is right, probably bad disk. Boot a live OS and run some disk check utils.
The microcode update must be run from the OS at every boot - it's not persistent. If the manufacturer of your system offers a BIOS update, by all means install that first.
ZFS is more robust than any other file system existing. That's really just a fact. So it would be interesting to know more about your epic install failure. How much memory does your box have?
Also, N100 have problems with PTI, so there are needed sysctls. At this time, I think you have to install 25.7, then apply the sysctls that are linked to here (#23) (https://forum.opnsense.org/index.php?topic=42985.0), reboot and then upgrade to 25.7.2, which would otherwise expose the problem because of its new kernel.
It sure is better with ZFS, still and also, you should use the microcode update package (after the upgrade).
The system barked on 'chown: dhcpd: illegal group name', so where do you come up with "there is no group named dhcp"
cat/etc/group | grep -i dhcp
_dhcp
It also has a leading underscore.
root@OPNsense:/var/db/pkg # chown dhcp:dhcp local.sqlite.bak
chown: dhcp: illegal group name
root@OPNsense:/var/db/pkg # chown dhcp:_dhcp local.sqlite.bak
chown: dhcp: illegal user name
I am primarily a Redhat admin, there are some quirks in BSD I am unfamiliar with, but this doesn't appear to e one of them.
Quote from: Patrick M. Hausen on September 08, 2025, 08:58:50 PMThe microcode update must be run from the OS at every boot - it's not persistent. If the manufacturer of your system offers a BIOS update, by all means install that first.
ZFS is more robust than any other file system existing. That's really just a fact. So it would be interesting to know more about your epic install failure. How much memory does your box have?
16 GB. It normally runs with about 14 GB free. The failure was just an infinite hang at the end of the install. The epic part was figuring out how to clean all the ZFS residue off the disk - as I didn't know what to expect having never used it before. dd ~ 100GB usually cleans all, but not with ZFS.
Quote from: chindokae on September 08, 2025, 09:53:42 PMdd ~ 100GB usually cleans all, but not with ZFS.
Not quite. That's not a ZFS but a GPT issue. There is a backup GPT partition table at the end of the disk, so you need to wipe that, too.
Then again GPT is the current standard. Nobody uses MBR anymore apart from special embedded systems and the like.
Just to update, it is booted in multiuser mode and running fine. The sqlite package database got trashed. I may try the update again later. I did manage to get smartmon tools installed and so far it is clean. No logged historical errors, short and long tests both completed without error. Due the lack of dmidecode I can't locate the microcode version but it is likely to be 0x1c. Due to the lack of devcpu-data I can't easily patch it at runtime, something that OpenBSD and Linux handle automatically.
Install the os-cpu-microcode-intel plugin and reboot.
Well, maybe microcode and mitigations aren't the problem after all:
dmesg | grep micro
[1] CPU microcode: updated from 0xe to 0x1d
sysctl hw.ibrs_disable vm.pmap.pti
hw.ibrs_disable: 0
vm.pmap.pti: 1
The sqlite database is not corrupt physically this time, semantically, IDK.
***GOT REQUEST TO CHECK FOR UPDATES***
Currently running OPNsense 25.7 (amd64) at Mon Sep 8 21:00:10 UTC 2025
Fetching changelog information, please wait... done
Updating OPNsense repository catalogue...
Child process pid=11217 terminated abnormally: Segmentation fault
Child process pid=11850 terminated abnormally: Segmentation fault
Child process pid=13706 terminated abnormally: Segmentation fault
self: No packages available to install matching 'opnsense'
***DONE***
I cleaned out the pkg data and caches and got it bootstrapped again. I can install packages and search for things:
pkg search -Q name opnsense
opnsense-25.7.2
Name : opnsense
Comment : OPNsense community release
So from the command line it seems OK, but from the UI, touching the "check for updates" freezes the web gui. That requires areboot from the shell.
Is there an updated image available to install 25.7.2?
Quote from: chindokae on September 08, 2025, 08:04:41 PMPlease explain how the CPU can cause layer 7 problems.
Well, from the freshPorts site (https://www.freshports.org/sysutils/cpu-microcode-intel) I track down intel-ucode releases for N100 and I only find one functional issue for N100 (below), so I guess a cpu issue of such can cause all sorts of weirdness.
It's also the only functional issue I could find that is address in applying ucode. I however don't expect this issue to be present at every boot, because it's noted "Under complex microarchitectural conditions".
QuoteCPU May Not Load The Most Recent Data
Under complex microarchitectural conditions, a read on one logical processor may
not receive the most recently stored data by another logical processor
Due to this erratum, unpredictable system behavior or a system hang may occur.
Intel has only observed this behavior in a synthetic test environment. Intel has not
observed this erratum with any commercially available system.
Reference: https://cdrdv2.intel.com/v1/dl/getContent/764616
Quote from: chindokae on September 08, 2025, 11:41:24 PMo from the command line it seems OK, but from the UI, touching the "check for updates" freezes the web gui.
How about run the audit from the Gui.
0x1d does appear to be latest for N100/N200 series (06-be-00 ucode).
Reference:
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20250512
ucode https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/refs/heads/main/intel-ucode/06-be-00
Beyond that May release I don't find any ucode that has N100 in it.
Te recap - I am running the latest microcode - 0x1d and I applied the recommended sysctl settings at boot.
I have cleaned out the corrupt files in the /var/db/pkg and /var/cache/pkg directories and bootrapped pkg again.
I had to reinstall it (pkg) for no apparent reason, since I didn't remove it, but it went in cleanly.
I was able to install smartmontools and os-cpu-microcode-intel.
I downloaded images today that are labeled 25.7 and appear to be running that, but the updated versions I see in the repo are not available for download.
pkg says I am up to date even though I see what appear to be two new revisions, 25.7.2 and 25.7.1.
I can only check for updates from the command line because letting the GUI do that cause it to lock up.
All the segmentation faults, etc, are now fixed from the cli.
The repo is not approve reproach - the smartmontools package did not install via pkg install because it tried to whack and entire /man folder. It tried to move /usr/local/share/man/.pkgtemp.man8.W8KvzBQqO5Xf over usr/local/share/man/man8 which contains a lot of manpage files. It wasn't trying to copy into it, it tried to replace it and was thankfully blocked. I installed the package out of the cache, but that is hardly a smooth install.
That is all the time I can afford to put into this problem today. I was *supposed* to be working on hardening the QNAP when this happened. New RAM is on the porch and I need to put that in and leave well enough alone for on this problem.
Edit: RAM went in without a fight, Gott sei dank.
To finish off the day's status and for sake of completeness:
I ran long and short smartctl tests and they both completed without error. dmesg does not show any indication of physical corruption of the disk, no read or write faults, no bad block error, etc.
The initial crash happened during an attempt to update via the GUI and that still isn't right. Even if it doesn't crash the system, it breaks the httpd service and that requires a reboot, but that is not as bad as the beeping black screen of death.
I have mitigated the presumed firmware issue but have not achieved full operation. I still see no indication of any hardware faults. The runaway update error may just have recursed away all available RAM and not crashed the stack. Either way, it's going down.
Given the inconsistencies in patching even from the command line I have to wonder if the repos got a second bad update. It happens.
Quote from: BrandyWine on September 09, 2025, 04:11:10 AMQuote from: chindokae on September 09, 2025, 01:22:29 AMTo recap
How about run the audit from the Gui.
If you mean the one on the update page:
Version 25.7 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 25.7 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check installed repositories
OPNsense (Priority: 11)
>>> Check installed plugins
os-cpu-microcode-intel 1.1
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: ........ done
>>> Check for missing or altered package files
Checking all packages: ........ done
>>> Check for core packages consistency
Core package "opnsense" not known to package database.
The the packagesite package has been upgraded a few times. It reports 898 packages and no errors, but even though it appears 27.1.3 is out as of today, it couldn't be installed - until one last update tonight and not it is downloading 171 packages.
Quote from: chindokae on September 10, 2025, 12:08:11 AM>>> Check for core packages consistency
Core package "opnsense" not known to package database.
I think this issue has been spoken to before by OPNsense folks here on the forum. Need to fix this item.
I think the fix was to install it from cli.
Maybe start with
pkg check -d
pkg clean -an (check and see what dry run says)
pkg clean -a (cleans the cache out)
pkg install -f opnsense
reboot
Then re-run the audit. If it's all good then use GUI update to see what it does.
Also use search feature, usually comes back with something good.
https://forum.opnsense.org/index.php?topic=48599.msg245505#msg245505
Quote from: BrandyWine on September 10, 2025, 12:57:21 AMQuote from: chindokae on September 10, 2025, 12:08:11 AM>>> Check for core packages consistency
Core package "opnsense" not known to package database.
I think this issue has been spoken to before by OPNsense folks here on the forum. Need to fix this item.
I think the fix was to install it from cli.
Maybe start with
pkg check -d
pkg clean -an (check and see what dry run says)
pkg clean -a (cleans the cache out)
pkg install -f opnsense
reboot
Then re-run the audit. If it's all good then use GUI update to see what it does.
Also use search feature, usually comes back with something good.
https://forum.opnsense.org/index.php?topic=48599.msg245505#msg245505
None of those things had any effect yesterday and I tend let Chat do my searching these days, although I did start here with a search. As always, knowing the root cause of the problem and searching for that initially always seems to work a lot better than having to work from the initial presentation of the problem to its eventual resolution. Makes you look smarter, too.
The resolution to this was purging the sqlite database files - which failed many times yesterday - then trying again today after 25.7.3 was released, reinitializing the local database with packagesite, then working through the sequence of update steps that eventually resolved the issue and got it patching via the GUI again.
This problem started with patching and ended with it. No amount of trying to update while 27.5.2 was the latest release on the repos worked, but as soon as 27.5.3 showed up, the recommended recovery techniques worked and I could patch from the console, the boot menu, or the GUI. I copied the steps out of history, and today, they worked.
It is nice to see that the cpu-microcode-intel package now deals with the firmware issue. Now this product works like the other major Unix distros.
I think I'll go with Occam's Razor on this one and say the cause was trying to update to 25.7.2.
Quote from: chindokae on September 10, 2025, 04:33:25 AMNone of those things had any effect yesterday
You ran all those commands yesterday? Did I miss where you posted that?
So with that pkg error your device is good? If so then just leave it alone.
> It is nice to see that the cpu-microcode-intel package now deals with the firmware issue. Now this product works like the other major Unix distros.
It's strange really: at some point in time nothing was needed for N100, then came a operating system update and the thing was unstable. Then came microcode updates and it was stable again. Now changing two sysctls in 25.7 made it operate less optimal again. A third sysctl was discovered. At some point the microcode updates were not working and now they work again.
Meanwhile hundreds of thousands of users had no apparent weirdness on their installs. Makes perfect sense?
And please listen: I'm not saying there's no issue. I'm saying there is definitely an issue and it's likely beyond our control.
Cheers,
Franco
Quote from: Patrick M. Hausen on September 08, 2025, 10:48:49 PMInstall the os-cpu-microcode-intel plugin and reboot.
I'm in the process of upgrading my OPNsense box to and odroid H3+. I was having all kinds of SATA errors. It turns out the latest H3+ bios version of the microcode is 0x10, while this plugin injects version 0x1d. It seems this microcode update fixes my SATA issue. I also set the tunables:
hw.pci.enable_aspm=0
vm.pmap.pcid_enabled=0
When installing the plugin, it indicated it was going be removed from a later OPNsense release after June 2025. I did ask odroid support for an upgrade bios with the latest intel microcode. We'll see how they respond.
So my question is, are there plans to remove the os-cpu-microcode-intel? I'm hoping the response is that removal is on hold, with no new date in sight.
The FreeBSD package messages when installing updates or plugins through the UI are completely irrelevant in the OPNsense context and only shown for diagnostic purposes.
As is clearly stated in the bottom line of that very dialog box. You can safely ignore everything that ever appears in that window.
The message comes from an auxiliary package that will be going away, but the microcode updates won't.