pkg update crashes and bricks firewall

Started by chindokae, September 08, 2025, 06:32:00 PM

Previous topic - Next topic
I just installed a fresh copy of 27.1 on my hardware firewall.  The installation was normal, configured LAN IP and DHCP scope, set root password, logged in with laptop on the LAN interface.  Went to the update firmware page and allowed it to patch.   It crashed and got stuck on the bug reporting page and would not leave it.  I submitted it twice then forced a reboot with a short press on the power switch which usually does a graceful shutdown.  Not this time, it gave three short beeps and powered off in less than 3 seconds.   When I powered it back up it never appeared on the network and pushing the power button gave the same immediate power off.

Rebuilt it on the workbench but this time updated from the console.  The core system was fully up to date but packages needed updating.  I tried updating from the console and it started normally, downloaded the packages, then crashed immediately when it tried to apply them.  The screen was flooded with thousands of errors, far too fast to read, then it rebooted itself. 

Unlike the attempt from the web GUI, it did not brick itself.  The first package was lighttpd, I believe.   I am not in the mood to retry the update to find out.

This should be easily reproducible as no customizations or settings other than LAN IP and LAN DHCP scope were applied.

Most probably a hardware problem. Shoot a video of the console messages with your smartphone.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Well, I should add that this hardware has been running OpnSense for nearly a year and was already at 27.1 when I made what should have been minor change to Unbound DNS, setting it to use secure DNS at Cloudflare.  That caused the loss of all host DNS resolution capability and when I disabled that service and went back to DHCP-based DNS, the GUI locked up and the firewall was essentially bricked. It would try to reboot if I used the power button but would not bring up the LAN interface.

I sent two bug reports under the email address associated with this account, all necessary information should be there. As I indicated, I am not willing to to take down the network again, I need it for work.  From a practical point of view, no camera I have at the house has the frame rate necessary to capture the errors anyway.  They were just a blur. 

The core system firmware boots and runs fine as long as I don't try to update packages.  I am running through the firewall now and using the web GUI as I write this.   

To me that doesn't look anything like hardware.  That looks like bad package.

You do you. Still looks more like a failing storage device than anything else to me. The packages of OPNsense are fine.

Are you perchance running an Intel N100 or similar CPU?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

sysctl -a | grep hw.model
hw.model: Intel(R) N100
hw.clockrate: 806
hw.ncpu: 4


Intel Alder Lake-M

Please explain how the CPU can cause layer 7 problems. 

I may not have been clear, but there were no hardware problems last week, last month, or last quarter, and nothing has changed.  I update monthly when I update all the Linux machines and the offline repos.   

I see nothing in dmesg or any other log to indicate a hardware issue, and there were no errors in /var/log/installer.log.   All in all, the system is operating nominally, as long as I don't try to update packages.

There are errors in the logs like:

/system_advanced_admin.php: The command '/usr/sbin/chown -R dhcpd:dhcpd '/var/dhcpd'' returned exit code '1', the output was 'chown: dhcpd: illegal group name'

Which is true, there is no group named dhcp and the user name is _dhcp

Other than that the logs are fairly clean - and nothing that is going to crash a BSD kernel - which in my experience is nearly impossible.

The update program is running entirely in user space, and if it is able to crash the kernel then there is a very good chance that whatever caused this is exploitable.

I sent two complete error reports. If that doesn't suffice, I could try running it in a VM to see how that goes. 


We have a ton of reports of N100 without current microcode updates in combination with UFS trashing on-disk data.
Consider re-installing with ZFS. Then after upgrade to the latest version add the microcode plugin.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Thanks, that is a viable explanation.

However, I did try ZFS first and that was an epic install failure.   It may not like that FS any better than the other.

I will have to push whatever action I take to the right a bit, primary user one is using the network and it had better not go down again or there will be something bad for dinner.

I will try to install Linux and see if I can get that to push the firmware.

September 08, 2025, 08:57:16 PM #7 Last Edit: September 08, 2025, 09:00:23 PM by BrandyWine
Quote from: chindokae on September 08, 2025, 08:04:41 PM/system_advanced_admin.php: The command '/usr/sbin/chown -R dhcpd:dhcpd '/var/dhcpd'' returned exit code '1', the output was 'chown: dhcpd: illegal group name'

Which is true, there is no group named dhcp and the user name is _dhcp

The system barked on 'chown: dhcpd: illegal group name', so where do you come up with "there is no group named dhcp"

Will your N100 boot to single user mode?

Quote from: chindokae on September 08, 2025, 08:52:44 PMI will try to install Linux and see if I can get that to push the firmware.
Install? No need to install, just run any lightweight liveLinux from boot USB, then look from there. I think Patrick is right, probably bad disk. Boot a live OS and run some disk check utils.

Mini-pc N150 i226-V, GOD BLESS CHARLIE KIRK

The microcode update must be run from the OS at every boot - it's not persistent. If the manufacturer of your system offers a BIOS update, by all means install that first.

ZFS is more robust than any other file system existing. That's really just a fact. So it would be interesting to know more about your epic install failure. How much memory does your box have?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Also, N100 have problems with PTI, so there are needed sysctls. At this time, I think you have to install 25.7, then apply the sysctls that are linked to here (#23), reboot and then upgrade to 25.7.2, which would otherwise expose the problem because of its new kernel.

It sure is better with ZFS, still and also, you should use the microcode update package (after the upgrade).

Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

The system barked on 'chown: dhcpd: illegal group name', so where do you come up with "there is no group named dhcp"

cat/etc/group | grep -i dhcp

_dhcp

It also has a leading underscore.

root@OPNsense:/var/db/pkg # chown dhcp:dhcp local.sqlite.bak
chown: dhcp: illegal group name
root@OPNsense:/var/db/pkg # chown dhcp:_dhcp local.sqlite.bak
chown: dhcp: illegal user name

I am primarily a Redhat admin, there are some quirks in BSD I am unfamiliar with, but this doesn't appear to e one of them.



Quote from: Patrick M. Hausen on September 08, 2025, 08:58:50 PMThe microcode update must be run from the OS at every boot - it's not persistent. If the manufacturer of your system offers a BIOS update, by all means install that first.

ZFS is more robust than any other file system existing. That's really just a fact. So it would be interesting to know more about your epic install failure. How much memory does your box have?

16 GB.   It normally runs with about 14 GB free.   The failure was just an infinite hang at the end of the install.  The epic part was figuring out how to clean all the ZFS residue off the disk - as I didn't know what to expect having never used it before.  dd ~ 100GB usually cleans all, but not with ZFS. 

Quote from: chindokae on September 08, 2025, 09:53:42 PMdd ~ 100GB usually cleans all, but not with ZFS.

Not quite. That's not a ZFS but a GPT issue. There is a backup GPT partition table at the end of the disk, so you need to wipe that, too.

Then again GPT is the current standard. Nobody uses MBR anymore apart from special embedded systems and the like.

Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Just to update, it is booted in multiuser mode and running fine.  The sqlite package database got trashed.  I may try the update again later.  I did manage to get smartmon tools installed and so far it is clean.  No logged historical errors, short and long tests both completed without error.  Due the lack of dmidecode I can't locate the microcode version but it is likely to be 0x1c.  Due to the lack of devcpu-data I can't easily patch it at runtime, something that OpenBSD and Linux handle automatically. 

Install the os-cpu-microcode-intel plugin and reboot.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)