Hi,
Since some time I'm having Kernel Panic reboots, and I can't really tell what is causing it, because same day 24.7 was released I also change the box, so it's now hard to narrow down the issue for me....and most probably has nothing to do with OPNSense, but sharing it in case it helps more ppl....
I've submitted the full reports from the GUI, but here some fragments that I believe show what's going on:
System Information:
User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36
FreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP amd64
OPNsense 24.7.2 8ffbc6387
Plugins os-acme-client-4.5 os-cpu-microcode-intel-1.0 os-igmp-proxy-1.5_2 os-mdns-repeater-1.1_1 os-sensei-1.17.6 os-sensei-agent-1.17.5 os-sensei-updater-1.17 os-sunnyvalley-1.4_3 os-theme-vicuna-1.47
Time Wed, 28 Aug 2024 23:42:55 +0200
OpenSSL 3.0.14
Python 3.11.9
PHP 8.2.22
Fatal trap 9: general protection fault while in kernel mode
cpuid = 15; apic id = 2e
instruction pointer = 0x20:0xffffffff810924ee
stack pointer = 0x28:0xfffffe014b0a4bd0
frame pointer = 0x28:0xfffffe014b0a4c00
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 36814 (python3.11)
rdi: fffffe001ea22b00 rsi: 000000000000000f rdx: 00000000000000ed
rcx: 2d8be74f1d661a99 r8: 000007fffffff000 r9: fffff800019c6868
rax: fffff801a04247b0 rbx: fffffe00070c2a08 rbp: fffffe014b0a4c00
r10: 0000000115915425 r11: fffff80000000000 r12: fffffe001ea22b00
r13: 0000000000000000 r14: fffff801a04247a8 r15: fffffe014b0a4c60
trap number = 9
panic: general protection fault
cpuid = 15
time = 1724877840
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014b0a4910
vpanic() at vpanic+0x131/frame 0xfffffe014b0a4a40
panic() at panic+0x43/frame 0xfffffe014b0a4aa0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe014b0a4b00
calltrap() at calltrap+0x8/frame 0xfffffe014b0a4b00
--- trap 0x9, rip = 0xffffffff810924ee, rsp = 0xfffffe014b0a4bd0, rbp = 0xfffffe014b0a4c00 ---
pmap_try_insert_pv_entry() at pmap_try_insert_pv_entry+0xbe/frame 0xfffffe014b0a4c00
pmap_copy() at pmap_copy+0x549/frame 0xfffffe014b0a4cb0
vmspace_fork() at vmspace_fork+0xc90/frame 0xfffffe014b0a4d30
fork1() at fork1+0x52e/frame 0xfffffe014b0a4da0
sys_fork() at sys_fork+0x54/frame 0xfffffe014b0a4e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe014b0a4f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe014b0a4f30
--- syscall (0, FreeBSD ELF64, syscall), rip = 0x826ce51fa, rsp = 0x8204b87d8, rbp = 0x8204b8830 ---
KDB: enter: panic
panic.txt0600003014663706020 7135 ustarrootwheelgeneral protection faultversion.txt0600007414663706020 7540 ustarrootwheelFreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP
<7>cannot forward src x:x::xx:x:x:x, dst x:x::xx:x:x:x, nxt 17, rcvif vlan0.30, outif pppoe0
<7>cannot forward src x:x::xx:x:x:x, dst x:x::xx:x:x:x, nxt 17, rcvif vlan0.30, outif pppoe0
nda0: nvme version 1.4
nda0: 476940MB (976773168 512 byte sectors)
Trying to mount root from zfs:zroot/ROOT/default []...
uhub0: 4 ports with 4 removable, self powered
uhub1: 16 ports with 16 removable, self powered
Root mount waiting for: usbus1
ugen1.2: <MediaTek Inc. WirelessDevice> at usbus1
Dual Console: Serial Primary, Video Secondary
[b]pid 31 (zpool) is attempting to use unsafe AIO requests - not logging anymore[/b]
/var/crash/info.0:
Dump header from device: /dev/nda0p3
Architecture: amd64
Architecture Version: 4
Dump Length: 103424
Blocksize: 512
Compression: none
Dumptime: 2024-08-28 22:44:00 +0200
Hostname: opnsense.local
Magic: FreeBSD Text Dump
Version String: FreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP
Panic String: general protection fault
Dump Parity: 1066394931
Bounds: 0
Dump Status: good
/var/crash/textdump.tar.0:
If you need any other log, just let me know.
The stack trace is rather generic. A hint may be "pid 31 (zpool) is attempting to use unsafe AIO requests - not logging anymore" which could point to a damaged zpool on the disk. If that's due to 24.7.2 importing all zpools correctly now is my first guess:
https://github.com/opnsense/core/commit/701dff45b2
You can revert using:
# opnsense-patch 701dff45b2
Cheers,
Franco
All patches have been applied successfully. Have a nice day.
I will keep you posted if crashes went away :) Thanks F.
@Franco, just fyi. After upgrading to 24.7.3, I didn't re-apply this patch and so far....Uptime 4 days, 10:45:51, without a single reboot/crash. Seems what it was causing the crash, is gone :)
Happy to hear, did you reinstall or old system back to normal?
Cheers,
Franco
I didn't reinstall or change anything. Just installed the update ;D
Ok fair enough :)
Quote from: franco on September 03, 2024, 12:16:16 PM
Ok fair enough :)
Hi @franco! After applying the update 24.7.4_1, the kernel panics reboot came back :(
24.7.4 was running 4 or 5 days in a row without a single reboot. If you need me to share any specific log, let me know. Running WAN on PPPoE conenction.
Fatal trap 9: general protection fault while in kernel mode
cpuid = 11; apic id = 26
instruction pointer = 0x20:0xffffffff80bfd5ad
stack pointer = 0x28:0xfffffe01d6227930
frame pointer = 0x28:0xfffffe01d6227940
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 84083 (dpinger)
rdi: 8cafc2d4f55ff2a0 rsi: fffff800075aa078 rdx: 0000000000000001
rcx: 0000000000000078 r8: 0000000000000002 r9: 0000000000000580
rax: fffff800075aa000 rbx: 8cafc2d4f55ff2a0 rbp: fffffe01d6227940
r10: fffff8018ce43000 r11: fffff80206fd7500 r12: fffff802b8b52958
r13: 0000000000000000 r14: fffff800075aa078 r15: fffff8001365fde0
trap number = 9
panic: general protection fault
cpuid = 11
time = 1726434089
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d6227670
vpanic() at vpanic+0x131/frame 0xfffffe01d62277a0
panic() at panic+0x43/frame 0xfffffe01d6227800
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe01d6227860
calltrap() at calltrap+0x8/frame 0xfffffe01d6227860
--- trap 0x9, rip = 0xffffffff80bfd5ad, rsp = 0xfffffe01d6227930, rbp = 0xfffffe01d6227940 ---
grouptaskqueue_enqueue() at grouptaskqueue_enqueue+0xd/frame 0xfffffe01d6227940
wg_peer_send_staged() at wg_peer_send_staged+0x1a7/frame 0xfffffe01d62279b0
wg_xmit() at wg_xmit+0x198/frame 0xfffffe01d6227a50
ip_output() at ip_output+0x129c/frame 0xfffffe01d6227b40
rip_send() at rip_send+0x40b/frame 0xfffffe01d6227bb0
sosend_generic() at sosend_generic+0x643/frame 0xfffffe01d6227c70
sousrsend() at sousrsend+0x5f/frame 0xfffffe01d6227cd0
kern_sendit() at kern_sendit+0x1be/frame 0xfffffe01d6227d60
sendit() at sendit+0x181/frame 0xfffffe01d6227db0
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe01d6227e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe01d6227f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d6227f30
--- syscall (133, FreeBSD ELF64, sendto), rip = 0x3e541904157a, rsp = 0x3e541cc45f28, rbp = 0x3e541cc45f70 ---
KDB: enter: panic
panic.txt0600003014671645451 7146 ustarrootwheelgeneral protection faultversion.txt0600007414671645451 7551 ustarrootwheelFreeBSD 14.1-RELEASE-p4 stable/24.7-n267825-d0d18dbbaba SMP
***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 24.7.4_1 at Mon Sep 16 08:50:38 CEST 2024
>>> Root file system: zroot/ROOT/default
>>> Check installed kernel version
Version 24.7.4 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 24.7.4 is correct.
>>> Check for missing or altered base files
Error 2 occurred.
etc/sysctl.conf:
size (299, 333)
sha256digest (0x45f469e7a9b4eef887bab7b55397305043fe101e1d6ce6f7e23d758e72f56dc6, 0x8cc5c942d7c5827a96087f872ae6ef860d1dd42172f226acf925b98181eda850)
>>> Check installed repositories
OPNsense
SunnyValley
>>> Check installed plugins
os-acme-client 4.5
os-cpu-microcode-intel 1.0
os-igmp-proxy 1.5_3
os-mdns-repeater 1.1_1
os-sensei 1.17.6
os-sensei-agent 1.17.5
os-sensei-updater 1.17
os-sunnyvalley 1.4_3
os-theme-advanced 1.0
os-theme-vicuna 1.48
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: .......... done
>>> Check for core packages consistency
Core package "opnsense" has 68 dependencies to check.
Checking packages: ..................................................................... done
***DONE***
Ok, a wireguard panic...
Can you get a vmcore.X file for me from the 24.7.4 debug kernel?
# opnsense-update -zkr dbg-24.7.4
(reboot to activate)
Thanks,
Franco
FWIW, the problem appears to be in this code or above:
https://github.com/opnsense/src/blob/02d5bd6ddffe8811f6bba4098f7b216ca51a5901/sys/dev/wg/if_wg.c#L1610-L1621
Cheers,
Franco
Quote from: franco on September 16, 2024, 09:05:51 AM
Ok, a wireguard panic...
Can you get a vmcore.X file for me from the 24.7.4 debug kernel?
# opnsense-update -zkr dbg-24.7.4
(reboot to activate)
Thanks,
Franco
I've applied opnsense-update -zkr dbg-24.7.4 and reboot, but I will need some help to capture vmcore. Would you mind poiting me what I should run to get it?
1. Wait for panic
2. Grab /var/crash/vmcore.0 after panic
3. Share it with me privately (it's a bigger file)
Cheers,
Franco
Quote from: franco on September 16, 2024, 10:19:51 AM
1. Wait for panic
2. Grab /var/crash/vmcore.0 after panic
3. Share it with me privately (it's a bigger file)
Cheers,
Franco
Hi @franco! I've sent you the file. Anything you need, just let me know.
Regards,
ff
@Franco: I have a similar problem and several crashes today. Should I install the debug kernel and send you a vmcore, too?
@cgone can you post the panic backtrace too as a reference point?
@furfix thanks, taking a look now
Cheers,
Franco
Looks like different panic?
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0149b4b8b0
vpanic() at vpanic+0x131/frame 0xfffffe0149b4b9e0
panic() at panic+0x43/frame 0xfffffe0149b4ba40
trash_ctor() at trash_ctor+0x53/frame 0xfffffe0149b4ba50
mb_ctor_pack() at mb_ctor_pack+0x3e/frame 0xfffffe0149b4ba90
item_ctor() at item_ctor+0x117/frame 0xfffffe0149b4bae0
m_getm2() at m_getm2+0x1aa/frame 0xfffffe0149b4bb50
m_uiotombuf() at m_uiotombuf+0x6f/frame 0xfffffe0149b4bbe0
uipc_sosend_dgram() at uipc_sosend_dgram+0x170/frame 0xfffffe0149b4bc70
sousrsend() at sousrsend+0x79/frame 0xfffffe0149b4bcd0
kern_sendit() at kern_sendit+0x1bc/frame 0xfffffe0149b4bd60
sendit() at sendit+0x184/frame 0xfffffe0149b4bdb0
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe0149b4be00
amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe0149b4bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0149b4bf30
> panic("Memory modified after free %p(%d) val=%lx @ %p\n", mem, size, *p, p);
Yes, well, this seems to point to a memory corruption that's going on for whatever reason, apparently in UDP (which woul also point to the WireGuard kernel module).
It could still be the same panic, but since the debug kernel has more panics like this one it tries to catch errors earlier, but here also the damage was already done.
The question is if this is caused inherently by hardware and it needs to be replaced or the errors go away without using WireGuard? This doesn't seem to be a prevalent issue, but it could still be a code problem.
Cheers,
Franco
Quote from: franco on September 19, 2024, 05:40:07 PM
@cgone can you post the panic backtrace too as a reference point?
Here is the trace back of the last crash. The crashes does not always give a crash dump.
ddb.txt06000014000014673124713 7102 ustarrootwheeldb:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo> show alllocks
No such command; use "help" to list available commands
db:1:lockinfo> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 3
dynamic pcpu = 0xfffffe009e97b080
curthread = 0xfffff8002d0bc740: pid 23521 tid 102231 critnest 1 "Eastpect Main Event"
curpcb = 0xfffff8002d0bcc60
fpcurthread = 0xfffff8002d0bc740: pid 23521 "Eastpect Main Event"
idlethread = 0xfffff80001974000: tid 100006 "idle: cpu3"
self = 0xffffffff83a13000
curpmap = 0xfffff801c96ad600
tssp = 0xffffffff83a13384
rsp0 = 0xfffffe0102b8c000
kcr3 = 0x80000003ae08d4b0
ucr3 = 0x80000003ae08ccb0
scr3 = 0x3ae08ccb0
gs32p = 0xffffffff83a13404
ldt = 0xffffffff83a13444
tss = 0xffffffff83a13434
curvnet = 0
db:0:kdb.enter.default> bt
Tracing pid 23521 tid 102231 td 0xfffff8002d0bc740
kdb_enter() at kdb_enter+0x33/frame 0xfffffe0102b8b9e0
panic() at panic+0x43/frame 0xfffffe0102b8ba40
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0102b8baa0
calltrap() at calltrap+0x8/frame 0xfffffe0102b8baa0
--- trap 0x9, rip = 0xffffffff8108cf63, rsp = 0xfffffe0102b8bb70, rbp = 0xfffffe0102b8bb70 ---
pmap_pvh_remove() at pmap_pvh_remove+0x23/frame 0xfffffe0102b8bb70
pmap_enter() at pmap_enter+0xd1e/frame 0xfffffe0102b8bc50
vm_fault() at vm_fault+0xbb7/frame 0xfffffe0102b8bd70
vm_fault_trap() at vm_fault_trap+0x4d/frame 0xfffffe0102b8bdc0
trap_pfault() at trap_pfault+0x1be/frame 0xfffffe0102b8be10
trap() at trap+0x4ab/frame 0xfffffe0102b8bf30
calltrap() at calltrap+0x8/frame 0xfffffe0102b8bf30
--- trap 0xc, rip = 0x827eed850, rsp = 0x8414947a8, rbp = 0x841494860 ---
My guess is that it is more likely a hardware fault, since the backtrace is often different in a different thread.
Quote from: franco on September 19, 2024, 06:00:23 PM
Looks like different panic?
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0149b4b8b0
vpanic() at vpanic+0x131/frame 0xfffffe0149b4b9e0
panic() at panic+0x43/frame 0xfffffe0149b4ba40
trash_ctor() at trash_ctor+0x53/frame 0xfffffe0149b4ba50
mb_ctor_pack() at mb_ctor_pack+0x3e/frame 0xfffffe0149b4ba90
item_ctor() at item_ctor+0x117/frame 0xfffffe0149b4bae0
m_getm2() at m_getm2+0x1aa/frame 0xfffffe0149b4bb50
m_uiotombuf() at m_uiotombuf+0x6f/frame 0xfffffe0149b4bbe0
uipc_sosend_dgram() at uipc_sosend_dgram+0x170/frame 0xfffffe0149b4bc70
sousrsend() at sousrsend+0x79/frame 0xfffffe0149b4bcd0
kern_sendit() at kern_sendit+0x1bc/frame 0xfffffe0149b4bd60
sendit() at sendit+0x184/frame 0xfffffe0149b4bdb0
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe0149b4be00
amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe0149b4bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0149b4bf30
> panic("Memory modified after free %p(%d) val=%lx @ %p\n", mem, size, *p, p);
Yes, well, this seems to point to a memory corruption that's going on for whatever reason, apparently in UDP (which woul also point to the WireGuard kernel module).
It could still be the same panic, but since the debug kernel has more panics like this one it tries to catch errors earlier, but here also the damage was already done.
The question is if this is caused inherently by hardware and it needs to be replaced or the errors go away without using WireGuard? This doesn't seem to be a prevalent issue, but it could still be a code problem.
Cheers,
Franco
Should I try reinstalling maybe? One of the first panic was about zpool, but never happened again, but per what you are saying, looks like it's never about the same :(
At the end of the log I still see it though:
Timecounter "TSC-low" frequency 1593603556 Hz quality 1000
Timecounters tick every 1.000 msec
ugen0.1: <Intel XHCI root HUB> at usbus0
ixl1: Link is up, 1 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: True, Flow Control: None
ixl1: link state changed to UP
debugnet_any_ifnet_update: Bad dn_init result from ixl1 (ifp 0xfffff8000326e000), ignoring.
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
uhub0 on usbus0
uhub0: <Intel XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
ugen1.1: <Intel XHCI root HUB> at usbus1
uhub1 on usbus1
uhub1: <Intel XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
nvme0: Allocated 64MB host memory buffer
nda0 at nvme0 bus 0 scbus0 target 0 lun 1
nda0: <CT500P3PSSD8 P9CR413 2417487F0AA6>
nda0: Serial Number X
nda0: nvme version 1.4
nda0: 476940MB (976773168 512 byte sectors)
Trying to mount root from zfs:zroot/ROOT/default []...
uhub0: 4 ports with 4 removable, self powered
uhub1: 16 ports with 16 removable, self powered
ugen1.2: <MediaTek Inc. WirelessDevice> at usbus1
pid 31 (zpool) is attempting to use unsafe AIO requests - not logging anymore
Another panic today, while using heavily a WG tunnel for a long period of time:
panic: Memory modified after free 0xfffff8015ea40800(2048) val=1ce4029760df7eac @ 0xfffff8015ea40dc0
cpuid = 3
time = 1726747522
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0149b4b8b0
vpanic() at vpanic+0x131/frame 0xfffffe0149b4b9e0
panic() at panic+0x43/frame 0xfffffe0149b4ba40
trash_ctor() at trash_ctor+0x53/frame 0xfffffe0149b4ba50
mb_ctor_pack() at mb_ctor_pack+0x3e/frame 0xfffffe0149b4ba90
item_ctor() at item_ctor+0x117/frame 0xfffffe0149b4bae0
m_getm2() at m_getm2+0x1aa/frame 0xfffffe0149b4bb50
m_uiotombuf() at m_uiotombuf+0x6f/frame 0xfffffe0149b4bbe0
uipc_sosend_dgram() at uipc_sosend_dgram+0x170/frame 0xfffffe0149b4bc70
sousrsend() at sousrsend+0x79/frame 0xfffffe0149b4bcd0
kern_sendit() at kern_sendit+0x1bc/frame 0xfffffe0149b4bd60
sendit() at sendit+0x184/frame 0xfffffe0149b4bdb0
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe0149b4be00
amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe0149b4bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0149b4bf30
--- syscall (133, FreeBSD ELF64, sendto), rip = 0x25520240c57a, rsp = 0x2551ff2eeea8, rbp = 0x2551ff2f30c0 ---
KDB: enter: panic
Going to backup the config, and do a clean re-install, and I will remove PCIe wifi that has no used on this box (just in case).
Otherwise kids and wife will open a sev1 and escalate it :D
Reinstalled. Wish me luck :) Keep you posted if another panic comes to my way.
The happyness was short :( After 10 hours of upgraded to 2.7.5, I got another panic:
Should I completed disable Wireguard? Do you think it's a hardware issue?
Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 18
fault virtual address = 0x30
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff810909f0
stack pointer = 0x28:0xfffffe01549b0710
frame pointer = 0x28:0xfffffe01549b0860
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 77999 (python3.11)
rdi: fffffe001ea27940 rsi: fffff80215ba0740 rdx: 0000000200000000
rcx: 0000000000000001 r8: 000007fffffff000 r9: 0000000000000063
rax: c6083eb6eceb6cea rbx: fffffffc00000000 rbp: fffffe01549b0860
r10: fffff80039fb8ce0 r11: fffff801dace1000 r12: 0000000000000021
r13: fffff80000000000 r14: 0000000000000000 r15: 39f7c14913149315
trap number = 12
panic: page fault
cpuid = 6
time = 1727378741
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01549b0400
vpanic() at vpanic+0x131/frame 0xfffffe01549b0530
panic() at panic+0x43/frame 0xfffffe01549b0590
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe01549b05f0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe01549b0640
calltrap() at calltrap+0x8/frame 0xfffffe01549b0640
--- trap 0xc, rip = 0xffffffff810909f0, rsp = 0xfffffe01549b0710, rbp = 0xfffffe01549b0860 ---
pmap_remove_pages() at pmap_remove_pages+0x5f0/frame 0xfffffe01549b0860
vmspace_exit() at vmspace_exit+0x80/frame 0xfffffe01549b0890
exit1() at exit1+0x53a/frame 0xfffffe01549b08f0
sigexit() at sigexit+0x13d/frame 0xfffffe01549b0d60
postsig() at postsig+0x23a/frame 0xfffffe01549b0e20
ast_sig() at ast_sig+0x1d7/frame 0xfffffe01549b0ed0
ast_handler() at ast_handler+0x88/frame 0xfffffe01549b0f10
ast() at ast+0x20/frame 0xfffffe01549b0f30
doreti_ast() at doreti_ast+0x1c/frame 0x87e205ef0
KDB: enter: panic
panic.txt0600001214675332465 7150 ustarrootwheelpage faultversion.txt0600007414675332465 7553 ustarrootwheelFreeBSD 14.1-RELEASE-p5 stable/24.7-n267840-e62d514886a SMP
It's beginning to look more and more like a hardware issue.
Sorry,
Franco
Quote from: franco on September 27, 2024, 11:23:53 AM
It's beginning to look more and more like a hardware issue.
Sorry,
Franco
I found this:
<6>pid 77267 (python3.11), jid 0, uid 0: exited on signal 10 (no core dump - bad address)
<6>pid 77999 (python3.11), jid 0, uid 0: exited on signal 10 (no core dump - bad address)
Also in System>log files>general:
2024-09-27T13:43:31 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/bin/kill -'TERM' '73423''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 73423: No such process'
2024-09-27T13:40:38 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '72153''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 72153: No such process'
Once family is sleeping, I will boot the machine with memtest86 and run it. Otherwise, I don't know what else to do...
imho memtest is a waste of time, frequent false negatives and takes LOTs of time. Order fresh RAM and see what happenz...
Another panic :(
Fatal trap 9: general protection fault while in kernel mode
cpuid = 9; apic id = 22
instruction pointer = 0x20:0xffffffff8108f4ee
stack pointer = 0x28:0xfffffe0158e9bbd0
frame pointer = 0x28:0xfffffe0158e9bc00
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 26738 (python3.11)
rdi: fffffe001ea8a480 rsi: 000000000000000c rdx: 0000000000000024
rcx: 46ff382abf19b8e7 r8: 000007fffffff000 r9: fffff8001ac55600
rax: fffff80188830168 rbx: fffffe0017e62a28 rbp: fffffe0158e9bc00
r10: 80000003ad429425 r11: fffff80000000000 r12: ffffffff81807940
r13: 0000000000000000 r14: fffff80188830160 r15: fffffe0158e9bc60
trap number = 9
panic: general protection fault
cpuid = 9
time = 1727453132
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0158e9b910
vpanic() at vpanic+0x131/frame 0xfffffe0158e9ba40
panic() at panic+0x43/frame 0xfffffe0158e9baa0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0158e9bb00
calltrap() at calltrap+0x8/frame 0xfffffe0158e9bb00
--- trap 0x9, rip = 0xffffffff8108f4ee, rsp = 0xfffffe0158e9bbd0, rbp = 0xfffffe0158e9bc00 ---
pmap_try_insert_pv_entry() at pmap_try_insert_pv_entry+0xbe/frame 0xfffffe0158e9bc00
pmap_copy() at pmap_copy+0x549/frame 0xfffffe0158e9bcb0
vmspace_fork() at vmspace_fork+0xc90/frame 0xfffffe0158e9bd30
fork1() at fork1+0x52e/frame 0xfffffe0158e9bda0
sys_fork() at sys_fork+0x54/frame 0xfffffe0158e9be00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe0158e9bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0158e9bf30
--- syscall (0, FreeBSD ELF64, syscall), rip = 0x8262491fa, rsp = 0x838afb3f8, rbp = 0x838afb450 ---
KDB: enter: panic
panic.txt0600003014675553714 7152 ustarrootwheelgeneral protection faultversion.txt0600007414675553714 7555 ustarrootwheelFreeBSD 14.1-RELEASE-p5 stable/24.7-n267840-e62d514886a SMP
Hi; going through the same thing; I'm running OPNSense on Proxmox and was very happy with it for 2+ months, but then started crashing. Tried both 24.1 and 24.7 but In my case, it appears that Proxmox was "gracefully" shutting down and rebooting OPNSense. OPNSense debug didn't indicate a Kernel Panic.
To fix it, I installed 24.7 and removed all additional NIC / Virtual bridges, and am presently running OPNSense as a basic / simple home router. No Surricata, Zenarmor, Crodsec, Vlans, Port Forwarding, Wireguard, or Proton VPN. Managed to get it running for 25 hrs and it crashed last night. This time it was a Kernel Panic.
I've now installed 24.7.5 and os-cpu-microcode-amd; if after this it crashes, I'll remove 2 memory dimms that I had installed on July 31st. I doubt though that this is a memory issue for me, cause the host system has ADGuard Home and a few basic containers working, and they all are operating fine.
If even after that, I can't achieve any form of stability, I'm disheartened to say, I'm gonna give PFSense CE a try.
If PFSense also crashes, then I have no option but to treat this as a hardware issue.
I've spent 4 months on my home lab, media server, web hosting; and now to see it all cash... is disheartening.
My current ISP doesn't have a 24h disconnect, but this may change in the future.
To check how well Opnsense can deal with it I enabled the 'Periodic reset interface' cron on the dial-in connection.
What shall I say, it panics at around 50% chance.