Multiple crashes with 24.7.10_2 and 24.7.8

Started by martin87, December 08, 2024, 01:45:41 PM

Previous topic - Next topic
Since the last update i have mulitple kernel panics with the last stable kernel ("stable/24.7-n267981-8375762712f")

ddb.txt06000014000014725171521  7077 ustarrootwheeldb:0:kdb.enter.default>  run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo>  show alllocks
No such command; use "help" to list available commands
db:1:lockinfo>  show lockedvnods
Locked vnodes
db:0:kdb.enter.default>  show pcpu
cpuid        = 0
dynamic pcpu = 0x124b080
curthread    = 0xfffff801c075e740: pid 13298 tid 101433 critnest 1 "pfctl"
curpcb       = 0xfffff801c075ec60
fpcurthread  = 0xfffff801c075e740: pid 13298 "pfctl"
idlethread   = 0xfffff800016c1740: tid 100003 "idle: cpu0"
self         = 0xffffffff82c10000
curpmap      = 0xfffff8004d906398
tssp         = 0xffffffff82c10384
rsp0         = 0xfffffe00c060f000
kcr3         = 0x58246000
ucr3         = 0x1e6c7a000
scr3         = 0x1e6c7a000
gs32p        = 0xffffffff82c10404
ldt          = 0xffffffff82c10444
tss          = 0xffffffff82c10434
curvnet      = 0xfffff800011a8b80
db:0:kdb.enter.default>  bt
Tracing pid 13298 tid 101433 td 0xfffff801c075e740
kdb_enter() at kdb_enter+0x33/frame 0xfffffe00c060e4b0
panic() at panic+0x43/frame 0xfffffe00c060e510
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00c060e570
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00c060e5c0
calltrap() at calltrap+0x8/frame 0xfffffe00c060e5c0
--- trap 0xc, rip = 0xffffffff80d053f7, rsp = 0xfffffe00c060e690, rbp = 0xfffffe00c060e6b0 ---
rn_walktree() at rn_walktree+0x77/frame 0xfffffe00c060e6b0
pfr_get_addrs() at pfr_get_addrs+0x122/frame 0xfffffe00c060e710
pfioctl() at pfioctl+0x221e/frame 0xfffffe00c060ebf0
devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfffffe00c060ec40
vn_ioctl() at vn_ioctl+0xce/frame 0xfffffe00c060ecb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00c060ecd0
kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe00c060ed40
sys_ioctl() at sys_ioctl+0xff/frame 0xfffffe00c060ee00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe00c060ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c060ef30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1cd74181d5fa, rsp = 0x1cd73c4fac58, rbp = 0x1cd73c4fb0f0 ---


After that i revertet with "opnsense-update -kr 24.7.8" to "stable/24.7-n267939-fd5bc7f34el" With this kernel i have also multiple kernel panics:

ddb.txt06000014000014725304712  7077 ustarrootwheeldb:0:kdb.enter.default>  run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo>  show alllocks
No such command; use "help" to list available commands
db:1:lockinfo>  show lockedvnods
Locked vnodes
db:0:kdb.enter.default>  show pcpu
cpuid        = 2
dynamic pcpu = 0xfffffe008e461080
curthread    = 0xfffff80193b42740: pid 51194 tid 101692 critnest 1 "python3.11"
curpcb       = 0xfffff80193b42c60
fpcurthread  = 0xfffff80193b42740: pid 51194 "python3.11"
idlethread   = 0xfffff800016c2740: tid 100005 "idle: cpu2"
self         = 0xffffffff82c12000
curpmap      = 0xffffffff81b81670
tssp         = 0xffffffff82c12384
rsp0         = 0xfffffe00bce3a000
kcr3         = 0xae3d1000
ucr3         = 0xffffffffffffffff
scr3         = 0x59533000
gs32p        = 0xffffffff82c12404
ldt          = 0xffffffff82c12444
tss          = 0xffffffff82c12434
curvnet      = 0
db:0:kdb.enter.default>  bt
Tracing pid 51194 tid 101692 td 0xfffff80193b42740
kdb_enter() at kdb_enter+0x33/frame 0xfffffe00bce39a00
panic() at panic+0x43/frame 0xfffffe00bce39a60
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00bce39ac0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00bce39b10
calltrap() at calltrap+0x8/frame 0xfffffe00bce39b10
--- trap 0xc, rip = 0xffffffff80baf29b, rsp = 0xfffffe00bce39be0, rbp = 0xfffffe00bce39be0 ---
unlock_rw() at unlock_rw+0xb/frame 0xfffffe00bce39be0
_vm_page_busy_sleep() at _vm_page_busy_sleep+0xc3/frame 0xfffffe00bce39c20
vm_object_page_remove() at vm_object_page_remove+0x141/frame 0xfffffe00bce39c80
vm_map_entry_delete() at vm_map_entry_delete+0xf5/frame 0xfffffe00bce39cc0
vm_map_delete() at vm_map_delete+0x7b/frame 0xfffffe00bce39d30
vm_map_remove() at vm_map_remove+0x96/frame 0xfffffe00bce39d60
vmspace_exit() at vmspace_exit+0xab/frame 0xfffffe00bce39d90
exit1() at exit1+0x53a/frame 0xfffffe00bce39df0
sys_exit() at sys_exit+0xd/frame 0xfffffe00bce39e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe00bce39f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00bce39f30
--- syscall (1, FreeBSD ELF64, exit), rip = 0x8265650da, rsp = 0x820d6b048, rbp = 0x820d6b060 ---
db:0:kdb.enter.default>  ps


I made a complete reinstall, but it didn't change anything. Is it possible a hardware problem and should i contact Deciso? My hardware is a new DEC2752, bought in july 2024. I still have a warranty.
At the moment my old reserve hardware is running with 24.7.8. This system is running stable.

Update:

I made another complete reinstall, but I updated manually from 24.7.0 to 24.7.8. Now it seems to be running stable since two days.
When I update to 24.7.10_2 an revert to 24.7.8 it crashes with kernel panics.

Does anyone have an idea?

I've seen rn_walktree() one before but the other one looks like a form of memory corruption.


Cheers,
Franco

I ran memtest, but without errors. Or does that say nothing? Should I contact Deciso? Because I still have warranty.


If after reinstall it works it may have been the disk, but it's hard to tell. If the issue is back you could help by running a debug kernel. I'm interested in the pfctl / rn_walktree() one in particular. It may be a genuine bug.


Cheers,
Franco

I copied the entire crash report. Does that help you?

No. If this is a code problem we need a core dump to inspect the issue with the code (e.g. exact code line causing it) and if it's a memory corruption it will be rather random anyway.


Cheers,
Franco

Ok, my option would be to swap the SSD and RAM as a test, but unfortunately that's not possible because of the warranty seal.

I would like to test the debug kernel and help to find the error.

How can I install it and how do I get to the core dump in the event of a crash?

opnsense-update -zkr dbg-24.7.10

opnsense-shell reboot



Once it crashes look in /var/crash

Can we establish if it still crashes with the same issues first before suggesting a debug kernel which could surface another issue? :)

Plus the 24.7.10 debug kernel has the bad pf state double-free behaviour...


Cheers,
Franco

Quote from: franco on December 11, 2024, 01:31:58 PM
Can we establish if it still crashes with the same issues first before suggesting a debug kernel which could surface another issue? :)

Plus the 24.7.10 debug kernel has the bad pf state double-free behaviour...


Cheers,
Franco

Ok, I'll update to 24.7.10_2 tomorrow and see if it crashes again. I will report...