[SOLVED] Kernel Panic - box restarts every few hours

Started by mem7192, January 04, 2025, 02:59:30 PM

Previous topic - Next topic
January 04, 2025, 02:59:30 PM Last Edit: January 10, 2025, 01:54:44 PM by mem7192 Reason: Solved problem
Hey, I'm running:
OPNsense 24.7.11_2-amd64
FreeBSD 14.1-RELEASE-p6
OpenSSL 3.0.15

on a Dell Optiplex 3080 i5 CPU with 8gb ram.

Getting random Kernel panic and the box restarts every 8-24 hours. I originally had been playing around with ram so I changed the ram configuration a few times and the only difference is I seem to get 24-ish hours between reboots now instead of 4-6. Before the Kernel panics, I had increased the ram and it ran stable for 14 days and then out of nowhere started to have this problem. Hopefully this log dump comes through ok as I currently don't have a desktop PC to work from so these were taken from an SSH app on an iPhone. I don't know how the formatting will look. Thank you for any help possible!

Btw: the ram is currently back to the same configuration that had run stable for months


<11>1 2025-01-02T04:02:00-05:00 OPNsense.lan configctl 62410 - [meta sequenceId="1"] error in configd communication  Traceback (most recent call last):   File "/usr/local/sbin/configctl", line 65, in exec_config_cmd     line = sock.recv(65536).decode()            ^^^^^^^^^^^^^^^^ TimeoutError: timed out
<45>1 2025-01-02T04:08:35-05:00 OPNsense.lan syslog-ng 34164 - [meta sequenceId="1"] syslog-ng starting up; version='4.8.1'
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="2"] Fatal trap 12: page fault while in kernel mode
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="3"] cpuid = 0; apic id = 00
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="4"] fault virtual address    = 0xaca52425
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="5"] fault code               = supervisor read data, page not present
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="6"] instruction pointer      = 0x20:0xffffffff8109fa60
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="7"] stack pointer            = 0x28:0xffffffff82e1d750
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="8"] frame pointer            = 0x28:0xffffffff82e1d750
: <13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="9"] code segment             = base 0x0, limit 0xfffff, type 0x1b
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="10"]                         = DPL 0, pres 1, long 1, def32 0, gran 1
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="11"] processor eflags        = interrupt enabled, resume, IOPL = 0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="12"] current process         = 0 (re0 taskq)
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="13"] rdi: 00000000aca52425 rsi: ffffffff82e1d7f8 rdx: 0000000000000028
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="14"] rcx: 00000000000568f1  r8: 0000000000000026  r9: 00000000c02d0426
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="15"] rax: 0000000000000000 rbx: fffff80001f43740 rbp: ffffffff82e1d750
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="16"] r10: 00000000e6b08a34 r11: 0000000081463854 r12: ffffffff82e1d7f8
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="17"] r13: 00000000aca52425 r14: fffff80001e00500 r15: fffffe00dc59c000
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="18"] trap number             = 12
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="19"] panic: page fault

<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="20"] cpuid = 0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="21"] time = 1735808760
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="22"] KDB: stack backtrace:
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="23"] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82e1d440
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="24"] vpanic() at vpanic+0x131/frame 0xffffffff82e1d570
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="25"] panic() at panic+0x43/frame 0xffffffff82e1d5d0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="26"] trap_fatal() at trap_fatal+0x40b/frame 0xffffffff82e1d630
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="27"] trap_pfault() at trap_pfault+0x46/frame 0xffffffff82e1d680
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="28"] calltrap() at calltrap+0x8/frame 0xffffffff82e1d680
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="29"] --- trap 0xc, rip = 0xffffffff8109fa60, rsp = 0xffffffff82e1d750, rbp = 0xffffffff82e1d750 ---
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="30"] memcmp() at memcmp+0x110/frame 0xffffffff82e1d750

<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="31"] pf_find_state() at pf_find_state+0xc0/frame 0xffffffff82e1d7a0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="32"] pf_test_state_tcp() at pf_test_state_tcp+0x1c4/frame 0xffffffff82e1d910
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="33"] pf_test6() at pf_test6+0x13ce/frame 0xffffffff82e1dac0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="34"] pf_check6_in() at pf_check6_in+0x5e/frame 0xffffffff82e1daf0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="35"] pfil_mbuf_in() at pfil_mbuf_in+0x38/frame 0xffffffff82e1db20
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="36"] ip6_input() at ip6_input+0x607/frame 0xffffffff82e1dc00
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="37"] netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xffffffff82e1dc50
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="38"] ether_demux() at ether_demux+0x149/frame 0xffffffff82e1dc80
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="39"] ether_nh_input() at ether_nh_input+0x36a/frame 0xffffffff82e1dce0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="40"] netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xffffffff82e1dd30

<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="41"] ether_input() at ether_input+0x56/frame 0xffffffff82e1dd80
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="42"] re_rxeof() at re_rxeof+0x344/frame 0xffffffff82e1de00
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="43"] re_int_task() at re_int_task+0xbd/frame 0xffffffff82e1de40
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="44"] taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xffffffff82e1dec0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="45"] taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xffffffff82e1def0
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="46"] fork_exit() at fork_exit+0x7f/frame 0xffffffff82e1df30
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="47"] fork_trampoline() at fork_trampoline+0xe/frame 0xffffffff82e1df30
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="48"] --- trap 0x7ae80af6, rip = 0xfed9d7f271182d56, rsp = 0x82612ed2a3d8c8d7, rbp = 0xc102a8878f76d1ca ---
<13>1 2025-01-02T04:08:35-05:00 OPNsense.lan kernel - - [meta sequenceId="49"] KDB: enter: panic

Hey.
I don't know if I'm hijacking the thread or if I have the same problem.

Since 3 weeks, every few (1-4 days) my box crashes and reboots.

Box is a APU2

OPNsense 24.7.11_2-amd64
FreeBSD 14.1-RELEASE-p6
OpenSSL 3.0.15

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x0
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff82185d9c
stack pointer           = 0x28:0xfffffe0062dd8e00
frame pointer           = 0x28:0xfffffe0062dd8e30
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 6 (pf purge)
rdi: fffff80019f23160 rsi: fffff80019f23160 rdx: 0000000092fdd10c
rcx: 0000000000000000  r8: 0000000022884788  r9: 0000000000000000
rax: 0000000000000000 rbx: fffff80019e9e420 rbp: fffffe0062dd8e30
r10: 0000000000000000 r11: 00000000853e1d38 r12: fffffe006ac79000
r13: 000000000000a816 r14: fffff80019f23160 r15: fffff80010b86000
trap number             = 12
panic: page fault
cpuid = 1
time = 1736110480
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0062dd8af0
vpanic() at vpanic+0x131/frame 0xfffffe0062dd8c20
panic() at panic+0x43/frame 0xfffffe0062dd8c80
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0062dd8ce0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe0062dd8d30
calltrap() at calltrap+0x8/frame 0xfffffe0062dd8d30
--- trap 0xc, rip = 0xffffffff82185d9c, rsp = 0xfffffe0062dd8e00, rbp = 0xfffffe0062dd8e30 ---
pf_detach_state() at pf_detach_state+0x5fc/frame 0xfffffe0062dd8e30
pf_unlink_state() at pf_unlink_state+0x290/frame 0xfffffe0062dd8e70
pf_purge_expired_states() at pf_purge_expired_states+0x188/frame 0xfffffe0062dd8ec0
pf_purge_thread() at pf_purge_thread+0x13b/frame 0xfffffe0062dd8ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe0062dd8f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0062dd8f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt0600001214736570620  7143 ustarrootwheelpage faultversion.txt0600007414736570620  7546 ustarrootwheelFreeBSD 14.1-RELEASE-p6 stable/24.7-n267979-0d692990122 SMP

the "current processs" is a different every time (so I added a few textdumps).

Please tell me if I should move to a own thread.


January 05, 2025, 11:00:43 PM #2 Last Edit: January 05, 2025, 11:02:43 PM by mem7192
I haven't had a chance to read through the rest of your log dumps but I went back through several days of mine and found out that the page fault happens right when syslog-ng seems to reload? I'm not sure about some of that stuff so I don't know if there is something wrong with syslog-ng or if it is supposed to reload regularly. After playing with my ram configurations, I came to the conclusion that it's not a ram problem. I don't have an extra SSD to test that it is a drive problem.

Have you noticed what comes right before the page fault?

I traced mine back to starting on the evening of Dec. 31st after my 2 year old "accidentally" hit the power button since I stupidly leave the router sitting out in the living room LOL! Ever since then, every time syslog-ng seems to reload, I have a problem. I have disabled the syslog service and despite now having no logging, I have yet to have another page fault and kernel panic

Edit to add: my next step is to try a fresh install. I just haven't had the time to do so yet

If that is a storage problem, I would suspect that you installed on UFS instead of ZFS - which you should.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

I didn't see the hotfix of the kernel-24.7.10, which a) needed to be reinstalled manually and b) was never shown to me, because I installed that version before the hotfix came out, which naturally didn't mention the hotfix. This should have really included in the .11 patchnotes.

Let's see if that fixes the problem.

QuotePlease note we had to hotfix the kernel which will not reinstall automatically if you caught the bad version. If you experience panics on 24.7.10 relating to pf(4) please reinstall from the GUI (which includes an automatic reboot) or run "opnsense-update -fk" from the shell followed by a manual reboot. The correct kernel identifies itself as "stable/24.7-n267981-8375762712f" using "uname -v

@meyergru - it is installed on ZFS and ran stable for months like that with no problems. Appreciate the input. I was just thinking maybe SSD issue because it seems to be syslog that's messing everything up. No kernel panic for 2 days now after stopping the syslog service.

@dedi - thanks for that info. Looks like my best bet is to export my config and grab the latest version from the website and do a fresh install. Thanks

If reinstalling the kernel fixes it for you there's no need to reinstall.

After reistalling the kernel if still having issues you could try reinstalling syslog-ng

pkg install -f syslog-ng

@newsense - I updated the kernel to
"stable/24.7-n267981-8375762712f" and rebooted. Then reinstalled syslog and restarted the service. So far the log shows that Syslog-ng has started and that's when the page fault was occurring. Thank you everyone for the help and I will report back. Fingers crossed!

Just reporting back... after 3 days of uptime, we are still running stable. Looks like all is good and thank you so much for your help. Updating the kernel and reinstalling syslog fixed everything. Interestingly enough, my ram use also seems to have gone down. Same amount of services and interfaces (I didn't change anything else) but I'm now only using half the memory that I was before. Wonder if that all had something to do with it. Anyways, all seems good now. Thank you!

Good to hear, please mark the thread as [Solved]

I'm experiencing exactly the same issue:
<45>1 2025-01-10T02:33:17+01:00 opnsense2 syslog-ng 28239 - [meta sequenceId="1"] syslog-ng starting up; version='4.8.1'
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="2"] Fatal trap 12: page fault while in kernel mode
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="3"] cpuid = 3; apic id = 03
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="4"] fault virtual address     = 0x0
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="5"] fault code                = supervisor write data, page not present
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="6"] instruction pointer       = 0x20:0xffffffff80f3c00f
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="7"] stack pointer             = 0x28:0xfffffe000edf1d10
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="8"] frame pointer             = 0x28:0xfffffe000edf1d50
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="9"] code segment              = base 0x0, limit 0xfffff, type 0x1b
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="10"]                  = DPL 0, pres 1, long 1, def32 0, gran 1
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="11"] processor eflags = interrupt enabled, resume, IOPL = 0
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="12"] current process          = 0 (thread taskq)
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="13"] rdi: fffffe008ea60400 rsi: 0000000000000000 rdx: 000000000000002e
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="14"] rcx: 0000000000000000  r8: 0000000000000000  r9: fffff80005c2f480
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="15"] rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe000edf1d50
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="16"] r10: fffff80005c2f480 r11: 00000000802e6e20 r12: fffff801c6694fe0
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="17"] r13: fffffe008ea60400 r14: fffff801c6694318 r15: fffff80005c2f540
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="18"] trap number              = 12
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="19"] panic: page fault
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="20"] cpuid = 3
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="21"] time = 1736472729
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="22"] KDB: stack backtrace:
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="23"] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000edf1a00
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="24"] vpanic() at vpanic+0x131/frame 0xfffffe000edf1b30
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="25"] panic() at panic+0x43/frame 0xfffffe000edf1b90
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="26"] trap_fatal() at trap_fatal+0x40b/frame 0xfffffe000edf1bf0
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="27"] trap_pfault() at trap_pfault+0x46/frame 0xfffffe000edf1c40
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="28"] calltrap() at calltrap+0x8/frame 0xfffffe000edf1c40
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="29"] --- trap 0xc, rip = 0xffffffff80f3c00f, rsp = 0xfffffe000edf1d10, rbp = 0xfffffe000edf1d50 ---
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="30"] zone_release() at zone_release+0x1df/frame 0xfffffe000edf1d50
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="31"] bucket_drain() at bucket_drain+0xb9/frame 0xfffffe000edf1d80
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="32"] bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x2ff/frame 0xfffffe000edf1de0
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="33"] zone_timeout() at zone_timeout+0x2eb/frame 0xfffffe000edf1e20
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="34"] uma_timeout() at uma_timeout+0x58/frame 0xfffffe000edf1e40
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="35"] taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe000edf1ec0
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="36"] taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe000edf1ef0
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="37"] fork_exit() at fork_exit+0x7f/frame 0xfffffe000edf1f30
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="38"] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000edf1f30
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="39"] --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="40"] KDB: enter: panic
<13>1 2025-01-10T02:33:17+01:00 opnsense2 kernel - - [meta sequenceId="41"] ---<<BOOT>>---
I just performed a manual kernel update (unfortunately, I didn't check what the previous version was :/), and we'll see if the situation improves for me as well.

Regards
Borys

@Borys - your log looks the same as mine did. Do what I did a couple posts up and I would imagine you will be good to go. Check the kernel version now that you've updated and then pkg install -f syslog-ng

Quote from: mem7192 on January 10, 2025, 06:00:32 PM@Borys - your log looks the same as mine did. Do what I did a couple posts up and I would imagine you will be good to go. Check the kernel version now that you've updated and then pkg install -f syslog-ng

I've already done that. Now I'm waiting and seeing if the kernel panic happens again.