Fatal trap 12: page fault while in kernel mode - supervisor read data

Started by klosz007, February 20, 2024, 01:48:22 PM

Previous topic - Next topic
Hi,

Has anyone ever found a definitive solution to these periodic crashes of OPNsense ? They are still present in 24.1.

I have seen these for the first time when I migrated my OPNsense VM instance (running on ESXi) from regular generic PC to chinese MiniPC with N5105 + I226V's.
Then read about chinese PC's with N5105's being possible cluprit.

So replaced this MiniPC with another one with Pentium 7505 - same story.

You can blame chinese MiniPC's. But... Recently I have seen the same crashes on OPNsense running on KVM on Synology device with AMD V1500 CPU.

So it's not caused by specific CPU or hypervisor.

Moreover all my other VMs are perfectly stable, including those running non-OPNsense FreeBSD 13.2. So it's something wrong with OPNsense but it's a corner case specific to some config since not all OPNsense VMs under my supervision are experiencing this.


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff8238d57f
stack pointer         = 0x0:0xfffffe000b9af6c0
frame pointer         = 0x0:0xfffffe000b9af720
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_3)
trap number = 12
panic: page fault
cpuid = 3
time = 1706361743
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000b9af480
vpanic() at vpanic+0x151/frame 0xfffffe000b9af4d0
panic() at panic+0x43/frame 0xfffffe000b9af530
trap_fatal() at trap_fatal+0x387/frame 0xfffffe000b9af590
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000b9af5f0
calltrap() at calltrap+0x8/frame 0xfffffe000b9af5f0
--- trap 0xc, rip = 0xffffffff8238d57f, rsp = 0xfffffe000b9af6c0, rbp = 0xfffffe000b9af720 ---
pf_test_state_udp() at pf_test_state_udp+0x28f/frame 0xfffffe000b9af720
pf_test() at pf_test+0xc57/frame 0xfffffe000b9af890
pf_check_in() at pf_check_in+0x25/frame 0xfffffe000b9af8b0
pfil_run_hooks() at pfil_run_hooks+0x97/frame 0xfffffe000b9af8f0
ip_input() at ip_input+0x799/frame 0xfffffe000b9af980
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe000b9af9d0
ether_demux() at ether_demux+0x159/frame 0xfffffe000b9afa00
ng_ether_rcv_upper() at ng_ether_rcv_upper+0x8c/frame 0xfffffe000b9afa20
ng_apply_item() at ng_apply_item+0x2bf/frame 0xfffffe000b9afab0
ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe000b9afaf0
ng_apply_item() at ng_apply_item+0x2bf/frame 0xfffffe000b9afb80
ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe000b9afbc0
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe000b9afbf0
ether_nh_input() at ether_nh_input+0x1f2/frame 0xfffffe000b9afc50
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe000b9afca0
ether_input() at ether_input+0x69/frame 0xfffffe000b9afd00
iflib_rxeof() at iflib_rxeof+0xbcb/frame 0xfffffe000b9afe00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe000b9afe40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe000b9afec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe000b9afef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000b9aff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000b9aff30
--- trap 0x6c617470, rip = 0x5ac8b830975c0, rsp = 0x8b8d4820000005a8, rbp = 0x30646870 ---
KDB: enter: panic

Thanks.

Really no one expeirencing this here ?

On variuos forums there are multiple reports of the same repeating crash (Fatal trap 12: page fault while in kernel mode - supervisor read data, page not present) over few last years and still no solution/still not fixed.

Both OPNsense and pfSense are affected when running as virtual machines (both KVM e.g. Proxmox and ESXi). Only running firewall on baremetal is fully trouble free.

It's a corner case though - running firewall on VM alone is not sufficient for this to happen. My primary instance was not affected when it was running on ESX on i3-10100, i7-10700 or i5-12500. It started to happen obnly when I moved VM to chinese miniPCs with N5105 or Pentium 7505. But cannot blame these chinese PCs since the same started to happen recently  on my collegaue's instance (OPNsense on KVM on Synology on AMD CPU).

Does this happen during PPPoE reconnect? If so, you might want to check this thread.
Unfortunately, it was never concluded as 23.7 is now legacy and 24.1 is still on FreeBSD 13.2 kernel.  :(

No, none of installations where I experience these reboots periodically use PPPoE.
One coindicence that I noticed is that they happen more frequently under high data transfers, or especially when downloading Torrents.

I'm experiencing the same crashes on my DEC840 after the upgrade to 24.1. from time to time after the upgrade / reinstall (last time yesterday with 24.1.2_1). My DEC840 crashes from time to time (about once per week) with that page fault.
There is also an issue on github regarding this: https://github.com/opnsense/core/issues/7280

Unfortunately, I have not found any more info for now.