Intel Alder Lake / N100 instability in FreeBSD and data corruption with UFS

Started by OPNenthu, August 04, 2025, 08:35:26 PM

Previous topic - Next topic
I came across this mailing list thread while searching online about FreeBSD instabilities with N100, as many have been reporting upgrade issues.  I'm not sure if this is related to the problematic microcode updates.

https://lists.freebsd.org/archives/freebsd-current/2025-January/006984.html

ChatGPT (for what it's worth) describes the issues like this:

Quote2. PCID / Cache Corruption Bug

    The N100 has a known CPU erratum: INVLPG instruction with PCID enabled fails to flush TLB entries, causing data corruption on UFS file systems (sometimes panics or inode mangling) [ref]

    The workaround: add

    vm.pmap.pcid_enabled=0 

    to loader.conf, ideally tested in production. Users report stability regained after disabling PCID [ref]

3. UFS Filesystem Instability

    Severe issues such as inode corruption, filesystem panics, or UFS failure have been seen repeatedly when PCID remains enabled and UFS is used [ref]

    ZFS appears to avoid these issues entirely.

Quote⚠️ Why Might You Want to Disable It?

Some CPUs (including Intel N100/Alder Lake-N) exhibit hardware bugs when PCID is used. Specifically:

    A known CPU erratum causes INVLPG (used to invalidate specific TLB entries) to fail when PCID is active.

    This can result in stale or corrupted memory mappings, leading to:

        Filesystem corruption (especially UFS)

        Kernel panics

        Data loss

        Subtle stability problems

Disabling PCID (vm.pmap.pcid_enabled=0) avoids using the broken logic path.
🧪 Who Should Set It?

If you're using:

    Intel N100 or other Alder Lake-N CPUs

    UFS as a filesystem

    FreeBSD 13.x or 14.x

👉 You should absolutely set vm.pmap.pcid_enabled=0 to ensure stability.

Seemed a little concerning and I thought I'd bring it up here for more technical insight.

I'm not affected personally as I don't have an N100 at this time.
"The power of the People is greater than the people in power." - Wael Ghonim

Site 1 | N5105 | 8GB | 256GB | 4x 2.5GbE (I226-V)
Site 2 |  J4125 | 8GB | 256GB | 4x 1GbE (I210)

ive got a n100 running on my workshop since aprox. 1 1/2 years without any issue with UFS and a local 128gb nvme in it. so i cant commit this issue.

I had similar issues with installing 25.7 (data corruption) installing the OS_CPU_MICROCODE_INTEL Plugin resolved the issue, previously was Not installed when I experienced the errors. Hooked up an monitor and noticed cluster errors during boot from upgrade (boot loop) reverted back to previous version installed the microcode plugin and then upgraded it has been running for an week now and several reboots no issues.


Since the issue is sort of elusive on the CPU level chances are this affects stability in other ways than ZFS in particular (or any FS generally) so I think the recommendation for the tunable is something to consider for all relevant installs:

vm.pmap.pcid_enabled=0

I've also come to believe that moving way from our previous defaults hw.ibrs_disable=0 and vm.pmap.pti=1 back to FreeBSD's defaults (1 and 0 respectively) may cause some of the currently seen instabilities. Feel free to double check by setting these again on 25.7 and up:

hw.ibrs_disable=0
vm.pmap.pti=1

A number of people complained about OPNsense being slower than FreeBSD which was because of these security settings. From the looks of it now it has traded stability for speed on the lower end Intel CPUs for the most part.

https://docs.opnsense.org/troubleshooting/hardening.html#spectre-and-meltdown


Cheers,
Franco

Quote from: franco on August 05, 2025, 07:36:12 AMhw.ibrs_disable=0
vm.pmap.pti=1

I have an observation, though unrelated to the main topic.

These two tunables you mention appear in the OPNsense UI and are co-located in the default sorting (convenient for screenshotting) but they don't appear to have any values or defaults:

You cannot view this attachment.

Corresponding 'sysctl':

root@firewall:~ # sysctl -a | grep -E 'vm.pmap.pcid_enabled|vm.pmap.pti|hw.ibrs_disable'
vm.pmap.pti: 0
vm.pmap.pcid_enabled: 0
hw.ibrs_disable: 1

I trust the sysctl output, just wondering why the OPNsense tunables list is that way?  I get that the tunables list isn't complete and the system may support additonal ones not listed in OPNsense, but if they're listed they should have values I think (?).

----

Regarding Meltdown/Spectre, looks like a big can of worms trying to determine which particular CPU is or isn't vulnerable as some models vary depending even on the particular stepping. Uff.

Since the important one for data corruption is already disabled as recommended, I'm going to leave these alone.  I'm not seeing any issues with performance or stability right now and I've kept up with UEFI and microcode updates.  Probably these OS mitigations are more critical on virtualized environments where other things can be running, but I could be wrong (and wish to be corrected, as always).
"The power of the People is greater than the people in power." - Wael Ghonim

Site 1 | N5105 | 8GB | 256GB | 4x 2.5GbE (I226-V)
Site 2 |  J4125 | 8GB | 256GB | 4x 1GbE (I210)

Quote from: OPNenthu on August 05, 2025, 08:57:25 PMroot@firewall:~ # sysctl -a | grep -E 'vm.pmap.pcid_enabled|vm.pmap.pti|hw.ibrs_disable'
vm.pmap.pti: 0
vm.pmap.pcid_enabled: 0
hw.ibrs_disable: 1

I trust the sysctl output, just wondering why the OPNsense tunables list is that way?  I get that the tunables list isn't complete and the system may support additonal ones not listed in OPNsense, but if they're listed they should have values I think (?).

We removed the explicit tunables from the config.xml, but we also removed the default values for it in order to go back to FreeBSD defaults. If you have these config.xml tunables but not set to "0" or "1" (meaning currently empty string "") they will use the system default now since we don't provide another default.  Looking at your data that is the expected output on 25.7 when nothing else was specified.


Cheers,
Franco

Clear now, thanks!
"The power of the People is greater than the people in power." - Wael Ghonim

Site 1 | N5105 | 8GB | 256GB | 4x 2.5GbE (I226-V)
Site 2 |  J4125 | 8GB | 256GB | 4x 1GbE (I210)

Quote from: BrandyWine on August 05, 2025, 07:24:47 AMMaybe just use ZFS ?

Took a chance and backed up my config, fresh installed using ZFS and restored. Running fine thank you for heading me in the direction easier than I thought.

Hi Franco,

Quote from: franco on August 05, 2025, 07:36:12 AMSince the issue is sort of elusive on the CPU level chances are this affects stability in other ways than ZFS in particular (or any FS generally) so I think the recommendation for the tunable is something to consider for all relevant installs:

vm.pmap.pcid_enabled=0

I've also come to believe that moving way from our previous defaults hw.ibrs_disable=0 and vm.pmap.pti=1 back to FreeBSD's defaults (1 and 0 respectively) may cause some of the currently seen instabilities. Feel free to double check by setting these again on 25.7 and up:

hw.ibrs_disable=0
vm.pmap.pti=1


Sorry that I've captured this thread, don't know how to delete this post...
I've added vm.pmap.pcid_enabled=0 and corrected vm.pmap.pti=1 (was 0) but still it won't upgrade to 25.7.1.1_1
I also tried to install the intel-microcode-plugin (which I hadn't installed yet) but it claims that it'd need upgrade to 25.7.1_1 first, which doesn't work...
Trying to upgrade fails with:

Checking integrity...Assertion failed: (strcmp(uid, p->uid) != 0), function pkg_conflicts_check_local_path, file pkg_jobs_conflicts.c, line 315.
Child process pid=62294 terminated abnormally: Abort trap
Starting web GUI...done.
***DONE***

Any more iedaes?

Thanks and best regards,
Jochen


Drop to the console and do

# pkg install os-cpu-microcode-intel

and reboot to activate...

# opnsense-shell reboot

Then try the update again.


Cheers,
Franco

Installed microcode plugin parallel in both ways /boot/loader.conf and /etc/rc.conf but still no upgrade possible. UI said the microcode plugin was misconfigured so i removed it.

opnsense-bootstrap would be the last resort before a clean reinstall, but not knowing what is wrong there's not much more guidance to give and things would just continue to deteriorate for unknown reasons (like hardware failures).


Cheers,
Franco

FWIW, I migrated to a new N200 6-intell 226 made in China box for 25.7.  Using default migration, I was having multiple random shutdowns per day. Per Franco, I reverted the tunables above to:

sysctl -a | grep -E 'vm.pmap.pcid_enabled|vm.pmap.pti|hw.ibrs_disable'

vm.pmap.pti: 1
vm.pmap.pcid_enabled: 0
hw.ibrs_disable: 0

and the box has been up and running without problems for 4 days now.

Don't want to take a victory lap just yet. Will post back once I hit the 1 week mark.  Also note that I use cron to reboot box every night.

D

Quote from: OPNenthu on August 04, 2025, 08:35:26 PMI came across this mailing list thread while searching online about FreeBSD instabilities with N100, as many have been reporting upgrade issues.  I'm not sure if this is related to the problematic microcode updates.

https://lists.freebsd.org/archives/freebsd-current/2025-January/006984.html

Thank you!

I have an N100 system. I recently upgraded to 25.7. The system crashed during the first boot after the upgrade. I saw that there were file system errors during the boot. After a re-install it appeared to be running OK. While reading the forum to try and solve some other problems I had during the upgrade I found this thread. I have now added vm.pmap.pcid_enabled=0 to the tunables. Even though it seems to be running fine I assume that there could still have been some file system corruption. Do you think I should re-install 25.7? If so, how would I do this so that the vm.pmap.pcid_enabled=0 setting is in place before the first boot? Sorry this may be simple but I'm not very good with Linux.