Intel Alder Lake / N100 instability in FreeBSD and data corruption with UFS

Started by OPNenthu, August 04, 2025, 08:35:26 PM

Previous topic - Next topic
Quote from: lmester on August 12, 2025, 06:41:42 AMEven though it seems to be running fine I assume that there could still have been some file system corruption. Do you think I should re-install 25.7? If so, how would I do this so that the vm.pmap.pcid_enabled=0 setting is in place before the first boot?

There's a chance that your disk has gone bad, which is something I see often on the forums.  Try to install the 'os-smart' plugin and run a S.M.A.R.T check to see about your disk health.  That plugin provides a simple status widget that you can add to the Lobby screen as well.  Probably not worth reinstalling if it's running well now and passing the Health audit (System->Firmware->Status->Run an audit->Health).

As for tunables during installation, you can set them temporarily from the boot menu:

https://forum.opnsense.org/index.php?topic=47494.msg239887#msg239887

You'll need console access- serial or VGA.
"The power of the People is greater than the people in power." - Wael Ghonim

Site 1 | N5105 | 8GB | 256GB | 4x 2.5GbE (I226-V)
Site 2 |  J4125 | 8GB | 256GB | 4x 1GbE (I210)

Quote from: OPNenthu on August 12, 2025, 12:25:58 PMThere's a chance that your disk has gone bad, which is something I see often on the forums.  Try to install the 'os-smart' plugin and run a S.M.A.R.T check to see about your disk health.  That plugin provides a simple status widget that you can add to the Lobby screen as well.  Probably not worth reinstalling if it's running well now and passing the Health audit (System->Firmware->Status->Run an audit->Health).

As for tunables during installation, you can set them temporarily from the boot menu:

https://forum.opnsense.org/index.php?topic=47494.msg239887#msg239887

You'll need console access- serial or VGA.

I think the system is corrupted. I'm seeing errors in the health audit. Can't install os-smart. Getting errors when I try to install updates.




############### Health audit errors ##############

***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 25.7 (amd64) at Tue Aug 12 09:20:05 EDT 2025
>>> Root file system: /dev/gpt/rootfs
>>> Check installed kernel version
Version 25.7 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 25.7 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check installed repositories
OPNsense (Priority: 11)
>>> Check installed plugins
os-nut 1.9
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: ....
nspr-4.37: checksum mismatch for /usr/local/bin/nspr-config
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_aix32.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_aix64.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_darwin.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_freebsd.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_hpux32.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_hpux64.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_linux.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_netbsd.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_nto.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_openbsd.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_qnx.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_riscos.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_solaris.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/md/_win95.cfg
nspr-4.37: checksum mismatch for /usr/local/include/nspr/pratom.h
nspr-4.37: checksum mismatch for /usr/local/include/nspr/prinit.h
nspr-4.37: checksum mismatch for /usr/local/lib/libnspr4.a
nspr-4.37: checksum mismatch for /usr/local/lib/libnspr4.so
nspr-4.37: checksum mismatch for /usr/local/lib/libplc4.so
nspr-4.37: checksum mismatch for /usr/local/lib/libplds4.so
nspr-4.37: checksum mismatch for /usr/local/libdata/pkgconfig/nspr.pc
nspr-4.37: missing file /usr/local/share/licenses/nspr-4.37/LICENSE
nspr-4.37: missing file /usr/local/share/licenses/nspr-4.37/MPL20
nspr-4.37: missing file /usr/local/share/licenses/nspr-4.37/catalog.mk
Checking all packages.....
py311-certifi-2025.7.14: missing file /usr/local/lib/python3.11/site-packages/certifi-2025.7.14.dist-info/LICENSE
py311-certifi-2025.7.14: missing file /usr/local/lib/python3.11/site-packages/certifi-2025.7.14.dist-info/METADATA
py311-certifi-2025.7.14: missing file /usr/local/lib/python3.11/site-packages/certifi-2025.7.14.dist-info/RECORD
py311-certifi-2025.7.14: missing file /usr/local/lib/python3.11/site-packages/certifi-2025.7.14.dist-info/WHEEL
py311-certifi-2025.7.14: missing file /usr/local/lib/python3.11/site-packages/certifi-2025.7.14.dist-info/top_level.txt
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__init__.py
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__main__.py
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__pycache__/__init__.cpython-311.opt-1.pyc
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__pycache__/__init__.cpython-311.pyc
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__pycache__/__main__.cpython-311.opt-1.pyc
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__pycache__/__main__.cpython-311.pyc
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__pycache__/core.cpython-311.opt-1.pyc
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/__pycache__/core.cpython-311.pyc
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/cacert.pem
py311-certifi-2025.7.14: checksum mismatch for /usr/local/lib/python3.11/site-packages/certifi/core.py
py311-certifi-2025.7.14: missing file /usr/local/share/licenses/py311-certifi-2025.7.14/LICENSE
py311-certifi-2025.7.14: missing file /usr/local/share/licenses/py311-certifi-2025.7.14/MPL20
py311-certifi-2025.7.14: missing file /usr/local/share/licenses/py311-certifi-2025.7.14/catalog.mk
Checking all packages.....
py311-typing-extensions-4.14.1: checksum mismatch for /usr/local/lib/python3.11/site-packages/__pycache__/typing_extensions.cpython-311.opt-1.pyc
py311-typing-extensions-4.14.1: checksum mismatch for /usr/local/lib/python3.11/site-packages/__pycache__/typing_extensions.cpython-311.pyc
py311-typing-extensions-4.14.1: missing file /usr/local/lib/python3.11/site-packages/typing_extensions-4.14.1.dist-info/METADATA
py311-typing-extensions-4.14.1: missing file /usr/local/lib/python3.11/site-packages/typing_extensions-4.14.1.dist-info/RECORD
py311-typing-extensions-4.14.1: missing file /usr/local/lib/python3.11/site-packages/typing_extensions-4.14.1.dist-info/WHEEL
py311-typing-extensions-4.14.1: missing file /usr/local/lib/python3.11/site-packages/typing_extensions-4.14.1.dist-info/licenses/LICENSE
py311-typing-extensions-4.14.1: checksum mismatch for /usr/local/lib/python3.11/site-packages/typing_extensions.py
Checking all packages..... done
>>> Check for core packages consistency
Core package "opnsense" at 25.7 has 68 dependencies to check.
Checking packages: .......................
opnsense-25.7 version mismatch, expected 25.7.1_1
Checking packages: ...........................
py311-duckdb-1.3.1_1 version mismatch, expected 1.3.2
Checking packages: ..............
sudo-1.9.17p1 version mismatch, expected 1.9.17p2
Checking packages: ..
syslog-ng-4.8.2_3 version mismatch, expected 4.8.2_4
Checking packages: ... done
***DONE***


#########  os-smart install fails #########

***GOT REQUEST TO INSTALL***
Currently running OPNsense 25.7 (amd64) at Tue Aug 12 09:24:10 EDT 2025
Installation out of date. The update to opnsense-25.7.1_1 is required.
***DONE***

######## Firmware update fails ###########

***GOT REQUEST TO UPDATE***
Currently running OPNsense 25.7 (amd64) at Tue Aug 12 00:54:33 EDT 2025
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Checking for upgrades (11 candidates): .......... done
Processing candidates (11 candidates): .......... done
The following 11 package(s) will be affected (of 0 checked):

Installed packages to be UPGRADED:
boost-libs: 1.88.0_1 -> 1.88.0_2
curl: 8.14.1 -> 8.15.0
ivykis: 0.43.2 -> 0.43.2_1
jq: 1.8.0 -> 1.8.1
libucl: 0.9.2_1 -> 0.9.2_2
nss: 3.113.1_1 -> 3.114
opnsense: 25.7 -> 25.7.1_1
py311-duckdb: 1.3.1_1 -> 1.3.2
sudo: 1.9.17p1 -> 1.9.17p2
syslog-ng: 4.8.2_3 -> 4.8.2_4
webp: 1.5.0 -> 1.6.0

Number of packages to be upgraded: 11

36 MiB to be downloaded.
[1/11] Fetching boost-libs-1.88.0_2.pkg: .......... done
[2/11] Fetching nss-3.114.pkg: .......... done
[3/11] Fetching jq-1.8.1.pkg: .......... done
[4/11] Fetching syslog-ng-4.8.2_4.pkg: .......... done
[5/11] Fetching webp-1.6.0.pkg: .......... done
[6/11] Fetching ivykis-0.43.2_1.pkg: .......... done
[7/11] Fetching curl-8.15.0.pkg: .......... done
[8/11] Fetching libucl-0.9.2_2.pkg: .......... done
[9/11] Fetching opnsense-25.7.1_1.pkg: .......... done
[10/11] Fetching py311-duckdb-1.3.2.pkg: .......... done
[11/11] Fetching sudo-1.9.17p2.pkg: .......... done
Checking integrity...Assertion failed: (strcmp(uid, p->uid) != 0), function pkg_conflicts_check_local_path, file pkg_jobs_conflicts.c, line 315.
Child process pid=26045 terminated abnormally: Abort trap
Starting web GUI...done.
***DONE***



QuoteAs for tunables during installation, you can set them temporarily from the boot menu:

https://forum.opnsense.org/index.php?topic=47494.msg239887#msg239887

I assume that i'll need to do this for all future updates unless the developers change the default for this tunable. Is this correct? This will make future updates painful :-(




Adding a persistent value to System: Settings: Tunables is all you need. This principle has not changed and if someone already had persistent values the update would not have changed them either.


Cheers,
Franco

sysctl -a | grep -E 'vm.pmap.pcid_enabled|vm.pmap.pti|hw.ibrs_disable'
vm.pmap.pti: 1
vm.pmap.pcid_enabled: 0
hw.ibrs_disable: 0


pkg clean -a

I still encounter an error when trying to update. Is it possible to upgrade without doing a clean install and restore config backup?

```
Checking integrity...Assertion failed: (strcmp(uid, p->uid) != 0), function pkg_conflicts_check_local_path, file pkg_jobs_conflicts.c, line 315.
Child process pid=37800 terminated abnormally: Abort trap
Starting web GUI...done.
***DONE***
```

I got sent this thread and though I think I understand what I'm ready, what I don't understand is "what are the final recommended tunables settings?" Each post flip-flops between 0|1 and I'm not sure what I should be applying.

What is recommended when on N100 and version 25.7.1_1-amd64
vm.pmap.pti: ?
vm.pmap.pcid_enabled: ?
hw.ibrs_disable: ?

--thanks

This thread can use a clean-up so here's my understanding of things:

People with Intel chips have been reporting problems during upgrade to 25.7, though the exact cause is not known.  There are some people who are helped by uninstalling the intel microcode; some not.  One person reported success after changing the tunables recommended here; others not.  There doesn't seem to be a smoking gun and it's not even clear if everyone's issues are the same.  Some people probably just have bad hardware or a corrupt filesystem (could be for various reasons, not necessarily due to this bug) which may be contributing to the reports.

That said-

The intel microcode workaround is discussed in other threads on the board, but as for the tunables in this thread:

vm.pmap.pcid_enabled=0

This ^ is the general recommendation for everyone on N-series chips.  It's been shown to cause corruptions when enabled (per the FreeBSD mailing list topic), so keep it off.  It's been defaulted to off in Linux as well.  It may or may not fix anything you are experiencing, but it will at least help prevent potentially severe issues down the line.

hw.ibrs_disable: 0
vm.pmap.pti: 1

These ^ two are suggestions from Franco in case you are still having stability issues, but YMMV.  I didn't set these personally because I'm not having any problems with either stability or performance (I just wanted to prevent corruptions) and if you're not seeing any issue either then maybe leave these as they are. 

The one poster so far who reported some success after changing tunables had modified all three, so it's not clear which one made the difference for them.
"The power of the People is greater than the people in power." - Wael Ghonim

Site 1 | N5105 | 8GB | 256GB | 4x 2.5GbE (I226-V)
Site 2 |  J4125 | 8GB | 256GB | 4x 1GbE (I210)

@OPNenthu thanks for that nice summary :-)
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)