Kernel Panics Reboot

Started by furfix, August 29, 2024, 01:02:40 AM

Previous topic - Next topic
August 29, 2024, 01:02:40 AM Last Edit: August 29, 2024, 01:04:29 AM by furfix
Hi,
Since some time I'm having Kernel Panic reboots, and I can't really tell what is causing it, because same day 24.7 was released I also change the box, so it's now hard to narrow down the issue for me....and most probably has nothing to do with OPNSense, but sharing it in case it helps more ppl....

I've submitted the full reports from the GUI, but here some fragments that I believe show what's going on:

System Information:
User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36
FreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP amd64
OPNsense 24.7.2 8ffbc6387
Plugins os-acme-client-4.5 os-cpu-microcode-intel-1.0 os-igmp-proxy-1.5_2 os-mdns-repeater-1.1_1 os-sensei-1.17.6 os-sensei-agent-1.17.5 os-sensei-updater-1.17 os-sunnyvalley-1.4_3 os-theme-vicuna-1.47
Time Wed, 28 Aug 2024 23:42:55 +0200
OpenSSL 3.0.14
Python 3.11.9
PHP 8.2.22



Fatal trap 9: general protection fault while in kernel mode
cpuid = 15; apic id = 2e
instruction pointer = 0x20:0xffffffff810924ee
stack pointer         = 0x28:0xfffffe014b0a4bd0
frame pointer         = 0x28:0xfffffe014b0a4c00
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 36814 (python3.11)
rdi: fffffe001ea22b00 rsi: 000000000000000f rdx: 00000000000000ed
rcx: 2d8be74f1d661a99  r8: 000007fffffff000  r9: fffff800019c6868
rax: fffff801a04247b0 rbx: fffffe00070c2a08 rbp: fffffe014b0a4c00
r10: 0000000115915425 r11: fffff80000000000 r12: fffffe001ea22b00
r13: 0000000000000000 r14: fffff801a04247a8 r15: fffffe014b0a4c60
trap number = 9
panic: general protection fault
cpuid = 15
time = 1724877840
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014b0a4910
vpanic() at vpanic+0x131/frame 0xfffffe014b0a4a40
panic() at panic+0x43/frame 0xfffffe014b0a4aa0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe014b0a4b00
calltrap() at calltrap+0x8/frame 0xfffffe014b0a4b00
--- trap 0x9, rip = 0xffffffff810924ee, rsp = 0xfffffe014b0a4bd0, rbp = 0xfffffe014b0a4c00 ---
pmap_try_insert_pv_entry() at pmap_try_insert_pv_entry+0xbe/frame 0xfffffe014b0a4c00
pmap_copy() at pmap_copy+0x549/frame 0xfffffe014b0a4cb0
vmspace_fork() at vmspace_fork+0xc90/frame 0xfffffe014b0a4d30
fork1() at fork1+0x52e/frame 0xfffffe014b0a4da0
sys_fork() at sys_fork+0x54/frame 0xfffffe014b0a4e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe014b0a4f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe014b0a4f30
--- syscall (0, FreeBSD ELF64, syscall), rip = 0x826ce51fa, rsp = 0x8204b87d8, rbp = 0x8204b8830 ---
KDB: enter: panic
panic.txt0600003014663706020  7135 ustarrootwheelgeneral protection faultversion.txt0600007414663706020  7540 ustarrootwheelFreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP


<7>cannot forward src x:x::xx:x:x:x, dst x:x::xx:x:x:x, nxt 17, rcvif vlan0.30, outif pppoe0
<7>cannot forward src x:x::xx:x:x:x, dst x:x::xx:x:x:x, nxt 17, rcvif vlan0.30, outif pppoe0



nda0: nvme version 1.4
nda0: 476940MB (976773168 512 byte sectors)
Trying to mount root from zfs:zroot/ROOT/default []...
uhub0: 4 ports with 4 removable, self powered
uhub1: 16 ports with 16 removable, self powered
Root mount waiting for: usbus1
ugen1.2: <MediaTek Inc. WirelessDevice> at usbus1
Dual Console: Serial Primary, Video Secondary
[b]pid 31 (zpool) is attempting to use unsafe AIO requests - not logging anymore[/b]
/var/crash/info.0:
Dump header from device: /dev/nda0p3
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 103424
  Blocksize: 512
  Compression: none
  Dumptime: 2024-08-28 22:44:00 +0200
  Hostname: opnsense.local
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP
  Panic String: general protection fault
  Dump Parity: 1066394931
  Bounds: 0
  Dump Status: good
/var/crash/textdump.tar.0:


If you need any other log, just let me know.

The stack trace is rather generic. A hint may be "pid 31 (zpool) is attempting to use unsafe AIO requests - not logging anymore" which could point to a damaged zpool on the disk. If that's due to 24.7.2 importing all zpools correctly now is my first guess:

https://github.com/opnsense/core/commit/701dff45b2

You can revert using:

# opnsense-patch 701dff45b2


Cheers,
Franco

All patches have been applied successfully.  Have a nice day.

I will keep you posted if crashes went away :) Thanks F.

@Franco, just fyi. After upgrading to 24.7.3, I didn't re-apply this patch and so far....Uptime 4 days, 10:45:51, without a single reboot/crash. Seems what it was causing the crash, is gone :)

Happy to hear, did you reinstall or old system back to normal?


Cheers,
Franco

I didn't reinstall or change anything. Just installed the update  ;D


Quote from: franco on September 03, 2024, 12:16:16 PM
Ok fair enough :)

Hi @franco! After applying the update 24.7.4_1, the kernel panics reboot came back :(

24.7.4 was running 4 or 5 days in a row without a single reboot. If you need me to share any specific log, let me know. Running WAN on PPPoE conenction.


Fatal trap 9: general protection fault while in kernel mode
cpuid = 11; apic id = 26
instruction pointer = 0x20:0xffffffff80bfd5ad
stack pointer         = 0x28:0xfffffe01d6227930
frame pointer         = 0x28:0xfffffe01d6227940
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 84083 (dpinger)
rdi: 8cafc2d4f55ff2a0 rsi: fffff800075aa078 rdx: 0000000000000001
rcx: 0000000000000078  r8: 0000000000000002  r9: 0000000000000580
rax: fffff800075aa000 rbx: 8cafc2d4f55ff2a0 rbp: fffffe01d6227940
r10: fffff8018ce43000 r11: fffff80206fd7500 r12: fffff802b8b52958
r13: 0000000000000000 r14: fffff800075aa078 r15: fffff8001365fde0
trap number = 9
panic: general protection fault
cpuid = 11
time = 1726434089
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d6227670
vpanic() at vpanic+0x131/frame 0xfffffe01d62277a0
panic() at panic+0x43/frame 0xfffffe01d6227800
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe01d6227860
calltrap() at calltrap+0x8/frame 0xfffffe01d6227860
--- trap 0x9, rip = 0xffffffff80bfd5ad, rsp = 0xfffffe01d6227930, rbp = 0xfffffe01d6227940 ---
grouptaskqueue_enqueue() at grouptaskqueue_enqueue+0xd/frame 0xfffffe01d6227940
wg_peer_send_staged() at wg_peer_send_staged+0x1a7/frame 0xfffffe01d62279b0
wg_xmit() at wg_xmit+0x198/frame 0xfffffe01d6227a50
ip_output() at ip_output+0x129c/frame 0xfffffe01d6227b40
rip_send() at rip_send+0x40b/frame 0xfffffe01d6227bb0
sosend_generic() at sosend_generic+0x643/frame 0xfffffe01d6227c70
sousrsend() at sousrsend+0x5f/frame 0xfffffe01d6227cd0
kern_sendit() at kern_sendit+0x1be/frame 0xfffffe01d6227d60
sendit() at sendit+0x181/frame 0xfffffe01d6227db0
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe01d6227e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe01d6227f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d6227f30
--- syscall (133, FreeBSD ELF64, sendto), rip = 0x3e541904157a, rsp = 0x3e541cc45f28, rbp = 0x3e541cc45f70 ---
KDB: enter: panic
panic.txt0600003014671645451  7146 ustarrootwheelgeneral protection faultversion.txt0600007414671645451  7551 ustarrootwheelFreeBSD 14.1-RELEASE-p4 stable/24.7-n267825-d0d18dbbaba SMP


***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 24.7.4_1 at Mon Sep 16 08:50:38 CEST 2024
>>> Root file system: zroot/ROOT/default
>>> Check installed kernel version
Version 24.7.4 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 24.7.4 is correct.
>>> Check for missing or altered base files
Error 2 occurred.
etc/sysctl.conf:
size (299, 333)
sha256digest (0x45f469e7a9b4eef887bab7b55397305043fe101e1d6ce6f7e23d758e72f56dc6, 0x8cc5c942d7c5827a96087f872ae6ef860d1dd42172f226acf925b98181eda850)
>>> Check installed repositories
OPNsense
SunnyValley
>>> Check installed plugins
os-acme-client 4.5
os-cpu-microcode-intel 1.0
os-igmp-proxy 1.5_3
os-mdns-repeater 1.1_1
os-sensei 1.17.6
os-sensei-agent 1.17.5
os-sensei-updater 1.17
os-sunnyvalley 1.4_3
os-theme-advanced 1.0
os-theme-vicuna 1.48
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: .......... done
>>> Check for core packages consistency
Core package "opnsense" has 68 dependencies to check.
Checking packages: ..................................................................... done
***DONE***

Ok, a wireguard panic...

Can you get a vmcore.X file for me from the 24.7.4 debug kernel?

# opnsense-update -zkr dbg-24.7.4

(reboot to activate)


Thanks,
Franco


Quote from: franco on September 16, 2024, 09:05:51 AM
Ok, a wireguard panic...

Can you get a vmcore.X file for me from the 24.7.4 debug kernel?

# opnsense-update -zkr dbg-24.7.4

(reboot to activate)


Thanks,
Franco

I've applied opnsense-update -zkr dbg-24.7.4 and reboot, but I will need some help to capture vmcore. Would you mind poiting me what I should run to get it?

1. Wait for panic
2. Grab /var/crash/vmcore.0 after panic
3. Share it with me privately (it's a bigger file)


Cheers,
Franco

Quote from: franco on September 16, 2024, 10:19:51 AM
1. Wait for panic
2. Grab /var/crash/vmcore.0 after panic
3. Share it with me privately (it's a bigger file)


Cheers,
Franco

Hi @franco! I've sent you the file. Anything you need, just let me know.

Regards,
ff

@Franco: I have a similar problem and several crashes today. Should I install the debug kernel and send you a vmcore, too?

@cgone can you post the panic backtrace too as a reference point?

@furfix thanks, taking a look now


Cheers,
Franco