Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode

Started by rafaelreisr, September 08, 2022, 03:40:07 PM

Previous topic - Next topic
Troubleshooting update:

TLDR: Potential solution found - use kvm64 cpu with aes flag enabled.


I have tried different CPU settings on Proxmox:

kvm64: -mitigation flags +aes = stable Only setup with more than 20hs uptime. I reached 48hs.
2 (1 sockets, 2 cores) [kvm64,flags=-pcid;-spec-ctrl;-ssbd;-ibpb;-virt-ssbd;-amd-ssbd;-amd-no-ssb;+aes]

Note: kvm64 is a legacy very old Pentium Based CPU with very little flags. Adding AES helps a lot. I didn't see apreciable CPU performance loss on 2.5gbit loads as compared to host-passthrough. NICs are still passthrough and also very stable.. Mitigations are enabled on host kernel.

qemu64: -mitigation flags +aes = unstable Crashes in sub 20hs as usual.
Crashes the same way as cpu passthrough or host-model

other cpu models I tried a few without success. Considering that qemu64, which is a very migration safe CPU, has crashed, I won't bother trying to cherry-pick which flag is causing the issue.

@yourfriendarmando: I believe it is less a hardware issue than a poor BIOS development. I'm sure the work put in by name brand solutions if far more refined. This CPU simplification workaround is a nice find, especially for users in 3rd world countries where importing is extremely expensive and chinese solutions such as these are a decent bang for the buck, although requires work.

@Nearly9892 and other repliers, considering this find, I don't think I'll bother with Proxmox. It would be worth it if I was clusterizing. But for this homelab single deployment it looks overkill.

Next step: I'll go back to the ubuntu ssd and replicate vm settings there, check if it remains as stable as in proxmox - it should, considering the underlying hypervisor is the same, and so far crashing behaviour between proxmox and ubuntu +kvm has been identical.


Sorry to resurrect this thread, but how did you make out?

I'm having the same issue with nearly the same model (n6005 is the only difference). Did you try a more dependable power supply?

Sadly, this is my solution for a N5105/N6005 regarding stability issues. (See attached picture)
Only the J4125 units does not require a fan over it.

CPU set to host
Machine set to q35
Memory allocated 8gb (Ballooning = Off)
PCI Device passthrough for LAN1 ~ LAN4 (6 port units)
PCI Device passthrough for LAN1 ~ LAN3 (4 port units)

The units for 6 ports are using Intel i226
The units for 4 ports are either using Intel i226 or RealTEK RTL8125

It was running for weeks on 22.7.8 then I upgraded.

OPNsense 22.7.10_2 seems to have issues with all my units, J4125 / N5105 / N6005 tested, but only as a VM in Proxmox. I have not tested bare metal. The N5105 / N6005 has stability issues outside OPNsense, when stress testing with Prime95. Memtest86 passes multiple times. I put a USB 140mm fan over the unit, and stability issues with stress testing goes away. OPNsense 22.7.10_2 still has issues crashing (VM in Proxmox), although it takes days to crash. No issues yet after 2 weeks, with pfSesnse 2.7 but that is a  development version.

Sorry to bring up pfSense if that annoys some people. I don't really care what software solutions is being used, as long as it is suitable for the use case.

I went ahead and purchased the AMD 5825U version with 4 i226. It has been perfectly stable with proxmox along side other vms. I had issues with intel 5105 regardless of tweaks. I recommend paying extra for the AMD even if just for piece of mind.

I've been running OPNSense on the N5105 version you have on bare metal for 5 months now with zero issues at all.

Temperature of the device is a steady 43 degrees C.

I know this doesn't help, but wanted to give some input running on Bare Metal.

Most help will you find here: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/
or here: https://forums.servethehome.com/index.php?threads/topton-jasper-lake-quad-i225v-mini-pc-report.36699/page-111

My system is stable after upgrade to Kernel 6.1 and Microcode-Update. C-States are still enabled.

After some LinuxVMs crashes (running k3s and microk8s without any pods or configuration) I decided that Proxmox was the issue and switched to VMware.

My plan is to try that, then Hyper-V, then give up and run bare metal. Something should eventually be stable  ;)

I got my CWWK N6005 working in Proxmox for 6 days without any crashing so far.

Opt-In Kernel 6.1

apt update
apt install pve-kernel-6.1


Update Intel Processor Microcode
Edit /etc/apt/sources.list add non-free to the following

deb http://ftp.debian.org/debian bullseye main contrib non-free
deb http://ftp.debian.org/debian bullseye-updates main contrib non-free
deb http://security.debian.org bullseye-security main contrib non-free

then

apt update
apt install intel-microcode

You can choose to remove non-free from the sources.list file, it doesn't really matter.

Disable CSTATE in GRUB (I am using UEFI with PCIe Passthru) do it according to your install

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt intel_idle.max_cstate=1 processor.max_cstate=1"

then

update-grub


I do not know what combination of the above is helping, I will see when I have time to re-test it if the VM doesn't crash after at least 10 days.