OPNsense Forum

Archive => 22.7 Legacy Series => Topic started by: rafaelreisr on September 08, 2022, 03:40:07 pm

Title: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: rafaelreisr on September 08, 2022, 03:40:07 pm
Hey everyone.

I've been wrestling with a fresh install of OPNSense in a KVM / QEMU env. Host is Ubuntu 22.04 Jammy.
Device is a Topton N5105 Celeron with 4 Intel igc 2.5gbit nics. 2x4GB DDR4
1 NIC is reserved to Ubuntu host. 3 NICs are passthough to VM with iommu.

This has been going on for weeks. I get random Kernel Panics + VM reboots every 15 to 20hs. Host is rock solid with over a week of uptime.

What have I tried so far:

1 - BIOS mode instead of UEFI (no change)
2 - adding nopti to kernel opts on host since it looked like acpi / mitigations issue. No change
3 - installing qemu-guest-tools-vm to opnsense. No change.

It is similar to https://forum.opnsense.org/index.php?topic=28302.0 and https://forum.opnsense.org/index.php?topic=28422.0

I have found similar reports in bare metal installations, other Hypervisors, so it seems a bit widespread.

The usual replies are that it is either a HW issue (unlikely due to it being so common), or it should be solved on 13.1 / 22.7, which is exactly my fresh installation.

Edit: Crash report https://forum.opnsense.org/index.php?action=post;quote=145955;topic=30230.0;last_msg=145955

Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: bartjsmit on September 08, 2022, 04:49:59 pm
The usual replies are that it is either a HW issue (unlikely due to it being so common)

Loads of bad RAM out there. Have you tested yours? https://www.memtest86.com/
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: Vesalius on September 08, 2022, 05:11:38 pm
Also similar to https://forum.opnsense.org/index.php?topic=29845.0?

Are you virtualizing the VM CPU as KVM/Qemu or using host? Have you tried not passing through the network adapter and using VirtIO instead, which should handle 2.5g fine? Either of those could narrow down the issue.

Starting to suspect something in this hardware combo is giving the underlying FreeBSD base fits. If virtualizing the 2.5g nic or the CPU (or both in combination) stops the Freebsd Kernel panics that should point in the general direction of an answer. Seems as though RAM issues would affect the host and VM.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: rafaelreisr on September 08, 2022, 07:42:40 pm
As expected, it crashed again. I have the crash report:

System Information:
Code: [Select]
User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Safari/605.1.15
FreeBSD 13.1-RELEASE-p1 stable/22.7-n250224-b668033f066 SMP amd64
OPNsense 22.7.2 412c0b79c
Plugins os-dmidecode-1.1_1 os-qemu-guest-agent-1.1 os-telegraf-1.12.5 os-upnp-1.4_2 os-wireguard-1.11
Time Thu, 08 Sep 2022 14:36:24 -0300
OpenSSL 1.1.1q  5 Jul 2022
Python 3.9.13
PHP 8.0.22

dmesg.boot:
Code: [Select]
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.1-RELEASE-p1 stable/22.7-n250224-b668033f066 SMP amd64
FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
VT(efifb): resolution 1024x768
CPU: Intel(R) Celeron(R) N5105 @ 2.00GHz (1996.78-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x906c0  Family=0x6  Model=0x9c  Stepping=0
  Features=0x1f83fbff
  Features2=0xcff8a223
  AMD Features=0x28100800
  AMD Features2=0x101
  Structured Extended Features=0x21940283
  Structured Extended Features2=0x18400124
  Structured Extended Features3=0xac000400
  XSAVE Features=0xf
  IA32_ARCH_CAPS=0x6b
  AMD Extended Feature Extensions ID EBX=0x100d000
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 2147483648 (2048 MB)
avail memory = 2032087040 (1937 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table:
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0  irqs 0-23
Launching APs: 1
random: entropy device external interface
wlan: mac acl policy registered
kbd1 at kbdmux0
WARNING: Device "spkr" is Giant locked and may be deleted before FreeBSD 14.0.
kvmclock0:
Timecounter "kvmclock" frequency 1000000000 Hz quality 975
kvmclock0: registered as a time-of-day clock, resolution 0.000001s
efirtc0:
efirtc0: registered as a time-of-day clock, resolution 1.000000s
smbios0:  at iomem 0x7f922000-0x7f92201e
smbios0: Version: 2.8, BCD Revision: 2.8
aesni0:
acpi0:
acpi0: Power Button (fixed)
cpu0:  on acpi0
atrtc0:  port 0x70-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
vgapci0:  mem 0xc0000000-0xc0ffffff,0xc2119000-0xc2119fff at device 1.0 on pci0
vgapci0: Boot video device
pcib1:  mem 0xc2118000-0xc2118fff irq 22 at device 2.0 on pci0
pci1:  on pcib1
pcib2:  mem 0xc2000000-0xc20000ff irq 22 at device 0.0 on pci1
pci2:  on pcib2
hdac0:  mem 0xc1e00000-0xc1e03fff irq 23 at device 1.0 on pci2
pcib3:  mem 0xc2117000-0xc2117fff irq 22 at device 2.1 on pci0
pci3:  on pcib3
virtio_pci0:  mem 0xc1c00000-0xc1c00fff,0x800000000-0x800003fff irq 22 at device 0.0 on pci3
pcib4:  mem 0xc2116000-0xc2116fff irq 22 at device 2.2 on pci0
pci4:  on pcib4
virtio_pci1:  mem 0xc1a00000-0xc1a00fff,0x800100000-0x800103fff irq 22 at device 0.0 on pci4
vtblk0:  on virtio_pci1
vtblk0: 40960MB (83886080 512 byte sectors)
pcib5:  mem 0xc2115000-0xc2115fff irq 22 at device 2.3 on pci0
pci5:  on pcib5
igc0:  mem 0xc1800000-0xc18fffff,0xc1900000-0xc1903fff irq 22 at device 0.0 on pci5
igc0: Using 1024 TX descriptors and 1024 RX descriptors
igc0: Using 2 RX queues 2 TX queues
igc0: Using MSI-X interrupts with 3 vectors
igc0: Ethernet address: 7c:2b:e1:13:00:5a
igc0: netmap queues/slots: TX 2/1024, RX 2/1024
pcib6:  mem 0xc2114000-0xc2114fff irq 22 at device 2.4 on pci0
pci6:  on pcib6
igc1:  mem 0xc1600000-0xc16fffff,0xc1700000-0xc1703fff irq 22 at device 0.0 on pci6
igc1: Using 1024 TX descriptors and 1024 RX descriptors
igc1: Using 2 RX queues 2 TX queues
igc1: Using MSI-X interrupts with 3 vectors
igc1: Ethernet address: 7c:2b:e1:13:00:5b
igc1: netmap queues/slots: TX 2/1024, RX 2/1024
pcib7:  mem 0xc2113000-0xc2113fff irq 22 at device 2.5 on pci0
pci7:  on pcib7
igc2:  mem 0xc1400000-0xc14fffff,0xc1500000-0xc1503fff irq 22 at device 0.0 on pci7
igc2: Using 1024 TX descriptors and 1024 RX descriptors
igc2: Using 2 RX queues 2 TX queues
igc2: Using MSI-X interrupts with 3 vectors
igc2: Ethernet address: 7c:2b:e1:13:00:5c
igc2: netmap queues/slots: TX 2/1024, RX 2/1024
pcib8:  mem 0xc2112000-0xc2112fff irq 22 at device 2.6 on pci0
pci8:  on pcib8
virtio_pci2:  mem 0x800200000-0x800203fff irq 22 at device 0.0 on pci8
vtballoon0:  on virtio_pci2
pcib9:  mem 0xc2111000-0xc2111fff irq 22 at device 2.7 on pci0
pci9:  on pcib9
xhci0:  mem 0xc1000000-0xc1003fff irq 22 at device 0.0 on pci9
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
isab0:  at device 31.0 on pci0
isa0:  on isab0
ahci0:  port 0xe040-0xe05f mem 0xc2110000-0xc2110fff irq 16 at device 31.2 on pci0
ahci0: AHCI v1.00 with 6 1.5Gbps ports, Port Multiplier not supported
ahcich0:  at channel 0 on ahci0
ahcich1:  at channel 1 on ahci0
ahcich2:  at channel 2 on ahci0
ahcich3:  at channel 3 on ahci0
ahcich4:  at channel 4 on ahci0
ahcich5:  at channel 5 on ahci0
acpi_syscontainer0:  on acpi0
acpi_syscontainer1:  port 0xcd8-0xce3 on acpi0
acpi_syscontainer2:  port 0x620-0x62f on acpi0
acpi_syscontainer3:  port 0xcc0-0xcd7 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
atkbdc0:  port 0x60,0x64 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0:  irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 14.0.
psm0: model IntelliMouse Explorer, device ID 4
attimer0:  at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounters tick every 10.000 msec
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
hdacc0:  at cad 0 on hdac0
hdaa0:  at nid 1 on hdacc0
pcm0:  at nid 3 and 5 on hdaa0
ugen0.1: <(0x1b36) XHCI root HUB> at usbus0
uhub0 on usbus0
uhub0: <(0x1b36) XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
Trying to mount root from zfs:zroot/ROOT/default []...
Root mount waiting for: usbus0
uhub0: 30 ports with 30 removable, self powered
Dual Console: Video Primary, Serial Secondary

/var/crash/info.0:
Code: [Select]
Dump header from device: /dev/vtbd0p3
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 72192
  Blocksize: 512
  Compression: none
  Dumptime: 2022-09-08 12:51:50 -0300
  Hostname: OPNsense.localdomain
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 13.1-RELEASE-p1 stable/22.7-n250224-b668033f066 SMP
  Panic String: page fault
  Dump Parity: 3509545778
  Bounds: 0
  Dump Status: good

/var/crash/textdump.tar.0: attached due to size restriction. Left the interesting bit:

Code: [Select]

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0xfffffc009bed98de
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff812267c0
stack pointer         = 0x28:0xfffffe0096da9b28
frame pointer         = 0x28:0xfffffe0096da9c20
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 75225 (telegraf)
trap number = 12
panic: page fault
cpuid = 1
time = 1662652310
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0096da98e0
vpanic() at vpanic+0x17f/frame 0xfffffe0096da9930
panic() at panic+0x43/frame 0xfffffe0096da9990
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0096da99f0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0096da9a50
calltrap() at calltrap+0x8/frame 0xfffffe0096da9a50
--- trap 0xc, rip = 0xffffffff812267c0, rsp = 0xfffffe0096da9b28, rbp = 0xfffffe0096da9c20 ---
lapic_handle_timer() at lapic_handle_timer/frame 0xfffffe0096da9c20
pmap_copy() at pmap_copy+0x561/frame 0xfffffe0096da9cc0
vmspace_fork() at vmspace_fork+0xc8a/frame 0xfffffe0096da9d40
fork1() at fork1+0x42a/frame 0xfffffe0096da9da0
sys_fork() at sys_fork+0x54/frame 0xfffffe0096da9e00
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe0096da9f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0096da9f30
--- syscall (2, FreeBSD ELF64, sys_fork), rip = 0x485ad6, rsp = 0xc0003251c8, rbp = 0xc0003252b0 ---
KDB: enter: panic
panic.txt0600001214306407626  7141 ustarrootwheelpage faultversion.txt0600007414306407626  7544 ustarrootwheelFreeBSD 13.1-RELEASE-p1 stable/22.7-n250224-b668033f066 SMP
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: rafaelreisr on September 08, 2022, 09:22:11 pm
Loads of bad RAM out there. Have you tested yours? https://www.memtest86.com/

It's on the 3rd pass of memtest86 and no errors yet. I highly doubt it is RAM. I have Micron modules from reputable brand, brand new, not from China. Also, Host is rock solid.

(https://snipboard.io/CNWhBH.jpg)

Also, similar to https://forum.opnsense.org/index.php?topic=29845.0?

Are you virtualizing the VM CPU as KVM/Qemu or using host? Have you tried not passing through the network adapter and using VirtIO instead, which should handle 2.5g fine? Either of those could narrow down the issue.


VM settings xml is attached in the original post. CPU is host-passthrough. I could try emulating CPU and NICs, although I feel it defeats the purpose of leveraging the hardware.
My best suspicion is a poor-coded BIOS. These chinese boards could have poor microcode implementation. Altough it would also show on Linux. It seems to be only giving problems on BSD.

Finally, I can also try a bare metal install just for testing. That is not my intended deployment, since it would waste a lot of the hardware potential, but I'd be happy to contribute to the experts for a potential fix.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: Vesalius on September 08, 2022, 10:48:37 pm
You would lose little to nothing virtualizing the cpu and host and really it might just be temporary to trouble shoot if the host nic or cpu direct interaction with FreeBSD are the issue. It’s more about systematically checking off those boxes of what might be the cause.

VirtIO on many host can do 10-20g of throughput and should have no issues with 2.5g.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: rafaelreisr on September 09, 2022, 12:55:08 am
I just installed it bare metal. Imaged and saved previous Ubuntu / KVM installation as a backup.

Did the recommended setup steps, and I'll now leave it running for a few days and report back.

If it runs fine, we will know for sure it is virtualization related. Then I'll move into the suggested VM troubleshooting.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: bartjsmit on September 09, 2022, 07:50:37 am
Might be worth looking at Proxmox and VMware ESXi as alternative hypervisors.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: rafaelreisr on September 11, 2022, 05:14:03 pm
Update the thread with current troubleshooting status:


Next steps:

Process is slow since the crashes only occur after 15-20hs. I'll be posting the results as I go.

Might be worth looking at Proxmox and VMware ESXi as alternative hypervisors.

I will give Proxmox a shot afterwards, just to see if KVM / QEMU is more stable there. But I don't feel is the best option for me. I need a Linux install to run Docker services and OPNSense. Proxmox does not support Docker natively, I'd have to run 2 VMs (OPNSense + Ubuntu or some other distro) + Hypevisor. That will be a lot for this machine. I'd rather have the Hypervisor be a Linux distro with Docker Support (Like the Ubuntu Host I'm currently running, and is very stable).


Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: Vesalius on September 11, 2022, 08:42:01 pm
Proxmox is just some binaries on top of a slightly modified Debian install. In fact, you can install Debian and then install proxmox to that.

Regardless of how you chose to install initially, you can have Docker running directly on the Debian/proxmox host easily as getting it running on Debian. Most people don't as installing docker on a lightweight proxmox Debian/Ubuntu/alpine LXC takes so few additional resources, but you can.

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_install_proxmox_ve_on_debian
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: rafaelreisr on September 14, 2022, 05:40:56 pm
Troubleshooting updates:

Installed Proxmox. Created the VM with the exact same settings as I had in Ubuntu (see if I replicate the problem on another Hypervisor).

Recap> Host CPU passthrough, NIC passthrough, no memory balooning, scsi virtio disks.

It crashed as well, around 20hs in. The VM crashes and is not rebooted automatically, which is a worst behavior than Ubuntu.

syslog from proxmox:
Code: [Select]
Sep 14 12:25:23 pve QEMU[1016]: extra data[0]: 0x0000000080000b0e
Sep 14 12:25:23 pve QEMU[1016]: extra data[1]: 0x0000000000000031
Sep 14 12:25:23 pve QEMU[1016]: extra data[2]: 0x0000000000000083
Sep 14 12:25:23 pve QEMU[1016]: extra data[3]: 0x0000000830917ff8
Sep 14 12:25:23 pve QEMU[1016]: extra data[4]: 0x0000000000000002
Sep 14 12:25:23 pve QEMU[1016]: RAX=0000000830917eb6 RBX=ffffffff81f5f0c0 RCX=00000000c0000101 RDX=00000000ffffffff
Sep 14 12:25:23 pve QEMU[1016]: RSI=0000000000000000 RDI=ffffffff81f5f0c0 RBP=ffffffff81f5f0b0 RSP=ffffffff81f5efe0
Sep 14 12:25:23 pve QEMU[1016]: R8 =000000c000c6e900 R9 =0000000000000000 R10=0000000000000000 R11=000000c000c6e900
Sep 14 12:25:23 pve QEMU[1016]: R12=ffffffffffffff99 R13=ffffffffffffff9f R14=000000c0010d5380 R15=0000000830917eb6
Sep 14 12:25:23 pve QEMU[1016]: RIP=ffffffff81133841 RFL=00010082 [--S----] CPL=0 II=0 A20=1 SMM=0 HLT=0
Sep 14 12:25:23 pve QEMU[1016]: ES =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Sep 14 12:25:23 pve QEMU[1016]: CS =0020 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
Sep 14 12:25:23 pve QEMU[1016]: SS =0000 0000000000000000 ffffffff 00c00000
Sep 14 12:25:23 pve QEMU[1016]: DS =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Sep 14 12:25:23 pve QEMU[1016]: FS =0013 0000000830c40130 ffffffff 00c0f300 DPL=3 DS   [-WA]
Sep 14 12:25:23 pve QEMU[1016]: GS =001b ffffffff82c10000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Sep 14 12:25:23 pve QEMU[1016]: LDT=0000 0000000000000000 ffffffff 00c00000
Sep 14 12:25:23 pve QEMU[1016]: TR =0048 ffffffff82c10384 00002068 00008b00 DPL=0 TSS64-busy
Sep 14 12:25:23 pve QEMU[1016]: GDT=     ffffffff82c103ec 00000067
Sep 14 12:25:23 pve QEMU[1016]: IDT=     ffffffff81f5d690 00000fff
Sep 14 12:25:23 pve QEMU[1016]: CR0=80050033 CR2=ffffffff81133841 CR3=0000000830917eb6 CR4=003506e8
Sep 14 12:25:23 pve QEMU[1016]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Sep 14 12:25:23 pve QEMU[1016]: DR6=00000000ffff0ff0 DR7=0000000000000400
Sep 14 12:25:23 pve QEMU[1016]: EFER=0000000000000d01
Sep 14 12:25:23 pve QEMU[1016]: Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

There is no crash logs recorded on OPNSense OS. Proxmox looking worse than Ubuntu so far.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: bartjsmit on September 14, 2022, 06:30:43 pm
I know it rules out containers (at least in the free version) but my ESXi VM with OPNsense has never crashed in its many years of use.

Bart...
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: Vesalius on September 15, 2022, 04:21:09 am
@rafaelreiser have you tried to run the opnsense VM either in ubuntu kvm/qemu or Proxmox without nic passthrough yet? Using a virtualized cpu and paravirtualized nics (virtio) seems to be about the only combo left to try.

I’ve also run OPNsense VM on Proxmox for years now without any sort of crashes like this, as have many others on the Proxmox forum I frequent, so no inherent generalized compatibility issues there on the software front.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: Nearly9892 on September 16, 2022, 02:19:50 pm
This is a common issue with those units. There isn't any hardware swapping or configuration tweaking you can do to fix it.

My suggestion right now is to upgrade the host kernel to the latest version 5.19 and report back your results. Maybe even try using proxmox just to work alongside the efforts of others.

Kernel upgrade thread: https://forum.proxmox.com/threads/opt-in-linux-5-19-kernel-for-proxmox-ve-7-x-available.115090/

Main thread tracking this issue: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: yourfriendarmando on September 17, 2022, 05:07:09 am
I do not have experience with Topton brand, but these look similar to those of Kettop and Qotom. I built one for a client and felt like I installed a time bomb after the complaints. Anything I tried could not make the OS happy on the Qotom.

I took the config and plugged it into my own smaller appliance PC, assigned interfaces, took it right over, and it has worked ever since. Saved my posterior and bought a replacement appliance, both of those from Protectli.

These appliance and Nuk size PCs are an amazing level of overkill for a firewall to stretch and scale, running on an external DC power supply.

An appliance PC I highly recommend running one thing on it here metal. Anything going virtual should go on a server class machine built for throughout.

This lesson I learned was definitely a case of getting what one pays for. I only state as objectively as possible my own experience.

For reference used a Kettop Home Router Mi3455P4 Intel Celeron J3455 from Amazon. Replaced with either Protectli FW4B or FW6D. The price difference is shadowed only by a great difference in quality.

Other miscellaneous reference: I use a maxed out Dell R710 to stage my VMs. It is running Ubuntu server 20.04. At any given time I can be running Windows, xBuntu, or OpnSense VMs running in Qemu with libvirtd
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: rafaelreisr on September 18, 2022, 05:06:43 pm
Troubleshooting update:

TLDR: Potential solution found - use kvm64 cpu with aes flag enabled.


I have tried different CPU settings on Proxmox:

kvm64: -mitigation flags +aes = stable Only setup with more than 20hs uptime. I reached 48hs.
Code: [Select]
2 (1 sockets, 2 cores) [kvm64,flags=-pcid;-spec-ctrl;-ssbd;-ibpb;-virt-ssbd;-amd-ssbd;-amd-no-ssb;+aes]
Note: kvm64 is a legacy very old Pentium Based CPU with very little flags. Adding AES helps a lot. I didn't see apreciable CPU performance loss on 2.5gbit loads as compared to host-passthrough. NICs are still passthrough and also very stable.. Mitigations are enabled on host kernel.

qemu64: -mitigation flags +aes = unstable Crashes in sub 20hs as usual.
Crashes the same way as cpu passthrough or host-model

other cpu models I tried a few without success. Considering that qemu64, which is a very migration safe CPU, has crashed, I won't bother trying to cherry-pick which flag is causing the issue.

@yourfriendarmando: I believe it is less a hardware issue than a poor BIOS development. I'm sure the work put in by name brand solutions if far more refined. This CPU simplification workaround is a nice find, especially for users in 3rd world countries where importing is extremely expensive and chinese solutions such as these are a decent bang for the buck, although requires work.

@Nearly9892 and other repliers, considering this find, I don't think I'll bother with Proxmox. It would be worth it if I was clusterizing. But for this homelab single deployment it looks overkill.

Next step: I'll go back to the ubuntu ssd and replicate vm settings there, check if it remains as stable as in proxmox - it should, considering the underlying hypervisor is the same, and so far crashing behaviour between proxmox and ubuntu +kvm has been identical.

Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: sdf_iain on January 07, 2023, 03:48:13 pm
Sorry to resurrect this thread, but how did you make out?

I’m having the same issue with nearly the same model (n6005 is the only difference). Did you try a more dependable power supply?
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: LiFE1688 on January 09, 2023, 01:58:51 pm
Sadly, this is my solution for a N5105/N6005 regarding stability issues. (See attached picture)
Only the J4125 units does not require a fan over it.

CPU set to host
Machine set to q35
Memory allocated 8gb (Ballooning = Off)
PCI Device passthrough for LAN1 ~ LAN4 (6 port units)
PCI Device passthrough for LAN1 ~ LAN3 (4 port units)

The units for 6 ports are using Intel i226
The units for 4 ports are either using Intel i226 or RealTEK RTL8125

It was running for weeks on 22.7.8 then I upgraded.

OPNsense 22.7.10_2 seems to have issues with all my units, J4125 / N5105 / N6005 tested, but only as a VM in Proxmox. I have not tested bare metal. The N5105 / N6005 has stability issues outside OPNsense, when stress testing with Prime95. Memtest86 passes multiple times. I put a USB 140mm fan over the unit, and stability issues with stress testing goes away. OPNsense 22.7.10_2 still has issues crashing (VM in Proxmox), although it takes days to crash. No issues yet after 2 weeks, with pfSesnse 2.7 but that is a  development version.

Sorry to bring up pfSense if that annoys some people. I don't really care what software solutions is being used, as long as it is suitable for the use case.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: Nearly9892 on January 09, 2023, 02:40:01 pm
I went ahead and purchased the AMD 5825U version with 4 i226. It has been perfectly stable with proxmox along side other vms. I had issues with intel 5105 regardless of tweaks. I recommend paying extra for the AMD even if just for piece of mind.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: ProximusAl on January 09, 2023, 02:44:33 pm
I've been running OPNSense on the N5105 version you have on bare metal for 5 months now with zero issues at all.

Temperature of the device is a steady 43 degrees C.

I know this doesn't help, but wanted to give some input running on Bare Metal.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: misery on January 10, 2023, 08:29:53 am
Most help will you find here: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/
or here: https://forums.servethehome.com/index.php?threads/topton-jasper-lake-quad-i225v-mini-pc-report.36699/page-111

My system is stable after upgrade to Kernel 6.1 and Microcode-Update. C-States are still enabled.
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: sdf_iain on January 15, 2023, 03:22:48 pm
After some LinuxVMs crashes (running k3s and microk8s without any pods or configuration) I decided that Proxmox was the issue and switched to VMware.

My plan is to try that, then Hyper-V, then give up and run bare metal. Something should eventually be stable  ;)
Title: Re: Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Post by: LiFE1688 on February 17, 2023, 03:04:19 am
I got my CWWK N6005 working in Proxmox for 6 days without any crashing so far.

Opt-In Kernel 6.1
Code: [Select]
apt update
apt install pve-kernel-6.1

Update Intel Processor Microcode
Edit /etc/apt/sources.list add non-free to the following
Code: [Select]
deb http://ftp.debian.org/debian bullseye main contrib non-free
deb http://ftp.debian.org/debian bullseye-updates main contrib non-free
deb http://security.debian.org bullseye-security main contrib non-free
then
Code: [Select]
apt update
apt install intel-microcode
You can choose to remove non-free from the sources.list file, it doesn't really matter.

Disable CSTATE in GRUB (I am using UEFI with PCIe Passthru) do it according to your install
Code: [Select]
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt intel_idle.max_cstate=1 processor.max_cstate=1"
then
Code: [Select]
update-grub

I do not know what combination of the above is helping, I will see when I have time to re-test it if the VM doesn't crash after at least 10 days.