Hello,
after updating today from 24.1.10 to 24.7.r1 I had some Kernel panics:
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80c1dfd0
stack pointer = 0x28:0xffffffff82841df0
frame pointer = 0x28:0xffffffff82841e00
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 7 (pf purge)
rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffff80001d15740
rcx: fffff80001d15740 r8: 0000000000003000 r9: 000000000000000f
rax: 0000000000000000 rbx: 0000000000000000 rbp: ffffffff82841e00
r10: fffff801f0ef8000 r11: 000000008083bf61 r12: 0000000000000000
r13: fffff80001d15740 r14: 0000000000000000 r15: 000000000001432c
trap number = 12
panic: page fault
cpuid = 3
time = 1721152911
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82841ae0
vpanic() at vpanic+0x131/frame 0xffffffff82841c10
panic() at panic+0x43/frame 0xffffffff82841c70
trap_fatal() at trap_fatal+0x40b/frame 0xffffffff82841cd0
trap_pfault() at trap_pfault+0x46/frame 0xffffffff82841d20
calltrap() at calltrap+0x8/frame 0xffffffff82841d20
--- trap 0xc, rip = 0xffffffff80c1dfd0, rsp = 0xffffffff82841df0, rbp = 0xffffffff82841e00 ---
turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xffffffff82841e00
__mtx_unlock_sleep() at __mtx_unlock_sleep+0x73/frame 0xffffffff82841e30
pf_unlink_state() at pf_unlink_state+0x338/frame 0xffffffff82841e70
pf_purge_expired_states() at pf_purge_expired_states+0x178/frame 0xffffffff82841ec0
pf_purge_thread() at pf_purge_thread+0x13b/frame 0xffffffff82841ef0
fork_exit() at fork_exit+0x7f/frame 0xffffffff82841f30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffff82841f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
My first experience was that it is only happening directly after a reboot, but now after some hours without any issue, it happen without any interaction from my side.
I will try to disable some tunables from 24.1 which are currently not required, as e.g. the Microcode upgrade is still active (and it seems like the boot process try to update it...):
CPU microcode: updated from 0xe to 0x17
CPU: Intel(R) N100 (806.40-MHz K8-class CPU)
Origin="GenuineIntel" Id=0xb06e0 Family=0x6 Model=0xbe Stepping=0
I reported the last two panics via the issue reporter; hopefully this is helping finding the issue.
Thanks,
Alex
I can also confirm this issue and also submitted a report
Probably https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279899 and sadly the usual behaviour from the usual suspects at this point.
Cheers,
Franco
Thanks Franco for the update and reaching out to FreeBSD.
It is correct that there is no way to disable pfsync completely? (I checked the man-pages and didn't found any tunable etc.)
We were wondering if this does this also crash with the beta kernel? Because it sort of indicates that it didn't before.
# opnsense-update -kr 24.7.b
Cheers,
Franco
Quote from: franco on July 16, 2024, 08:44:49 PM
Probably https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279899 and sadly the usual behaviour from the usual suspects at this point.
Cheers,
Franco
...not again please....
Quote from: franco on July 16, 2024, 09:19:11 PM
We were wondering if this does this also crash with the beta kernel? Because it sort of indicates that it didn't before.
# opnsense-update -kr 24.7.b
Cheers,
Franco
Let's try it out ;-)
Already downloaded, reboot is happening in a sec.
Quote from: computeralex92 on July 16, 2024, 09:32:59 PM
Quote from: franco on July 16, 2024, 09:19:11 PM
We were wondering if this does this also crash with the beta kernel? Because it sort of indicates that it didn't before.
# opnsense-update -kr 24.7.b
Cheers,
Franco
Let's try it out ;-)
Already downloaded, reboot is happening in a sec.
I'm now running the following kernel:
FreeBSD OPNsense.localdomain 14.1-RELEASE FreeBSD 14.1-RELEASE stable/24.7-n267717-cf61c67cb34 SMP amd64
I will keep you updated, but directly after the reboot no panic happen.
I've been running the beta on a VM and on a Protectli bare metal since it was released and experienced no crashes.
Both are now on the R1 kernel, will report if anything comes up (uptime is ~2 hours running strong)
I've installed 24.7 beta from the ISO into Proxmox VM and updated to RC1... Now I'm testing... no crash for now.
Quote from: computeralex92 on July 16, 2024, 09:37:57 PM
Quote from: computeralex92 on July 16, 2024, 09:32:59 PM
Quote from: franco on July 16, 2024, 09:19:11 PM
We were wondering if this does this also crash with the beta kernel? Because it sort of indicates that it didn't before.
# opnsense-update -kr 24.7.b
Cheers,
Franco
Let's try it out ;-)
Already downloaded, reboot is happening in a sec.
I'm now running the following kernel:
FreeBSD OPNsense.localdomain 14.1-RELEASE FreeBSD 14.1-RELEASE stable/24.7-n267717-cf61c67cb34 SMP amd64
I will keep you updated, but directly after the reboot no panic happen.
did the same and also no panic after reboot
Pardon me for asking, when I lookup pfsync, it deals with high availability. Do you have this setup? Reason I ask is that I don't have it setup and I'm trying to determine if I should upgrade and test. I'd rather wait if it's impacting those without HA too.
Have been using it for an hour so far and no crash.
This is Proxmox virtualized... not bare metal
Mkay...quick update.
I had reboots on the physical FWs, the virtualized one is stable.
Moved the physical ones on 24.7.b for now - where one of them ran just fine for a month, and keeping an eye on it.
No HA here either, just to make it clear.
Quote from: franco on July 16, 2024, 09:19:11 PM
We were wondering if this does this also crash with the beta kernel? Because it sort of indicates that it didn't before.
# opnsense-update -kr 24.7.b
Cheers,
Franco
Same here. Keep getting crashed every couple minutes with RC1 so I update to 24.7.b. It's been an hour and no crash. Love the dashboard but widgets are not resizable ?
>>> widgets are not resizable ?
They are if you unlock the dashboard in the upper right corner
I tried that, not working vertically. I wanna see full service list so I can extend it bit down, hit refresh and it's back to before.
Can we keep this thread to the core of the subject?
Let's bisect this then if BETA is good. I'll have a new kernel in a bit.
Cheers,
Franco
Quote from: franco on July 17, 2024, 07:03:49 AM
Can we keep this thread to the core of the subject?
Let's bisect this then if BETA is good. I'll have a new kernel in a bit.
Cheers,
Franco
Until now no panic with the beta kernel...
> Until now no panic with the beta kernel...
Good, here is the next one:
# opnsense-update -zkr 24.7.b_15
Cheers,
Franco
Quote from: franco on July 17, 2024, 07:30:53 AM
> Until now no panic with the beta kernel...
Good, here is the next one:
# opnsense-update -zkr 24.7.b_15
Cheers,
Franco
So far no panic after reboot:
FreeBSD OPNsense.localdomain 14.1-RELEASE-p1 FreeBSD 14.1-RELEASE-p1 n267732-007d9fa5c015 SMP amd64
Ok, second confirmation would be nice. This is going to be a weird one if it's in the later commits leading up to RC1.
Cheers,
Franco
b15 crashed immediately for me
After reboot, 24.7.b_15 crashed twice for me but then it's working fine so far. Submitted the problem, not sure if it's sent bc of the 2nd crash.
Ok guys I really have conflicting crash reports with different panics. If we screw up the bisect because our goal is "crash" we just produce heat and waste time. If you can send your crash reports on _15 so I can check...
Managed to get this for now...
<118>Root file system: zroot/ROOT/24.7.r1-b15Kernel
<118>Wed Jul 17 05:44:21 GMT 2024
<118>
<118>*** OPNsense.localdomain: OPNsense 24.7.r1 ***
<118>
...........................
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80c1e520
stack pointer = 0x28:0xfffffe0109632df0
frame pointer = 0x28:0xfffffe0109632e00
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 7 (pf purge)
rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffff8000906a000
rcx: fffff8000906a000 r8: ffffffff827e0490 r9: 0000000000000014
rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe0109632e00
r10: fffff801c8cae840 r11: 000000007ffc94e4 r12: 0000000000000000
r13: fffff8000906a000 r14: 0000000000000000 r15: 0000000000016d25
trap number = 12
panic: page fault
cpuid = 1
time = 1721195385
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0109632ae0
vpanic() at vpanic+0x131/frame 0xfffffe0109632c10
panic() at panic+0x43/frame 0xfffffe0109632c70
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0109632cd0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe0109632d20
calltrap() at calltrap+0x8/frame 0xfffffe0109632d20
--- trap 0xc, rip = 0xffffffff80c1e520, rsp = 0xfffffe0109632df0, rbp = 0xfffffe0109632e00 ---
turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0109632e00
__mtx_unlock_sleep() at __mtx_unlock_sleep+0x73/frame 0xfffffe0109632e30
pf_unlink_state() at pf_unlink_state+0x338/frame 0xfffffe0109632e70
pf_purge_expired_states() at pf_purge_expired_states+0x178/frame 0xfffffe0109632ec0
pf_purge_thread() at pf_purge_thread+0x13b/frame 0xfffffe0109632ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe0109632f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0109632f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
Ok great, next one is:
# opnsense-update -zkr 24.7.b_7
I'm joining the fun. Just had two of those crashes in a row after working for at least 20 hours straight. Can't say for _7 right now, but installed it too.
Cheers,
Franco
Testing b_7, uptime 20 minutes on one FW, but as I mentioned this crash seems random and not always immediately after boot.
Yep, the worst part is actually knowing it's "good" because the bug might just be hiding ;)
Cheers,
Franco
_15 was really broken on the 3rd FW, couldn't curl the kernel, opnsense-patch would timeout eventually complaining it cannot verify the sig - which was bonkers.
Managed to winscp the kernel and sig file and then I installed it with -zkr 24.7.b_7 -l /foldername and will see what happens.
The other two are happy for now on _7, with 60' and 80' uptime respectively.
Let's just try this one:
# opnsense-update -zkr 24.7.r1_2
I placed a bet...
Cheers,
Franco
Quote from: franco on July 17, 2024, 12:11:20 PM
Let's just try this one:
# opnsense-update -zkr 24.7.r1_2
I placed a bet...
Cheers,
Franco
Just installed it, until now no problems or panic.
Quote from: franco on July 17, 2024, 12:11:20 PM
Let's just try this one:
# opnsense-update -zkr 24.7.r1_2
I placed a bet...
Cheers,
Franco
Sorry for partially missing the tests. Submited a crash report after boot with this one but didn't see a "panic" in dmesg-
I've replaced the original kernel by including this https://github.com/opnsense/src/commit/de60ffe06fd6
It may or may not be the right one, but it looks promising and I want to avoid people catching the bad one as best we can.
Cheers,
Franco
Just finished moving all 3 boxes to r1_2.
_b7 was there on all 3 with an ~8 hour uptime.
Just to say I've not had any kernel panics with the initial 24.7.r1 - uptime 14 hours
n100 miniPC running Proxmox 8.2.4 16GB ram
opnsense 24.7.r1 in an 8GB VM with 1xintel i226v passthrough (wan) and 1 proxmox/linux bridge
Connection is pppoe, dual stack ipv4/v6
Simple config - using unbound, suricata (lan), crowdsec
All is 'just working'. Not seeing any unexpected kernel issues.
Nice job :-)
The kernel panics only happened on bare metal, virtualized worked ok.
Just a heads up for the other kernel testers, if you're on r1_2 from snapshots and check for updates the 24.7.r1 kernel will be installed and cause a reboot just because of the name change, otherwise it is the same kernel.
14.1-RELEASE-p2 FreeBSD 14.1-RELEASE-p2 stable/24.7-n267750-de60ffe06fd6 SMP amd64
de60ffe06fd6 is the relevant part of the hotfixed kernel, yep
Since _15 was bad and _7 was good it was just a matter of an educated guess and it looks like we found it. Thanks all for the help!
Cheers,
Franco
The new kernel is working for me without any issue.
Thanks all for the testing and debugging this problem!
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279899#c15
Of course, what looks like a proper fix found its way to FreeBSD's stable/14 branch yesterday afternoon ;)
Cheers,
Franco
I love your comment on there Franco.......
A man after my own heart!!!
Excellent job!
Thanks, but it's safe to assume the people that matter in this won't appreciate the candidness. Still how does the old saying go? "Do good things and talk about it" is what I'd like to see.
Here's an amended kernel with the proper fix. I also have it on my box so fingers crossed.
# opnsense-update -zkr 24.7.r1_5
Cheers,
Franco
Moved the fleet to the 5th amendment
With regards to the bug, and after reading the thread on bugs.freebsd.org, I still can't say I understood why it appeared to work just fine on multiple virtual environments but trigger relatively quickly on bare metal... Given its nature I would have expected a similar and consistent crash regardless of where it was running
Quote from: franco on July 18, 2024, 08:44:27 AM
Thanks, but it's safe to assume the people that matter in this won't appreciate the candidness. Still how does the old saying go? "Do good things and talk about it" is what I'd like to see.
Here's an amended kernel with the proper fix. I also have it on my box so fingers crossed.
# opnsense-update -zkr 24.7.r1_5
Cheers,
Franco
There is as well >
Quote
"Karma is extremely efficient, if one is extremely patient"
Many thanks Franco for taking care of this!
Regards,
S.
There is room for locking-related issues in pf states handling especially since it's actively being worked on (and I've seen a number of fixes that confirm this). A mildly related change just showed us by allowing a certain path previously not taken to break it, but it could also mean there are more of these issues in other places still. If they manifest only on hardware or due to specific traffic patterns or configuration or plain race conditions between state cleanup kernel thread and active state handling is unclear.
Cheers,
Franco
Probably the ones with Intel Ethernet adapters reported no crashes, I have Realtek, I had installed kernel 24.7.r1_7 and it crashed the moment I started a computer on the LAN side. Maybe it does not like Zenarmor blocking some website.
I have intel nics and still crashing every few hours with _5 kernel. (sent crash reports)
Yup, I sent two crash reports, one with _5 and the other _7. Or so I think, since I had bectl-ed beforehand to 24.1 stable before sending the crash reports.
I haven't seen any crash report with the particular stack trace today matching any of _2, _5 or _7 so far. Also no crash on my main production box.
Cheers,
Franco
Yup, 24.7 did not notice the crash. But bectl-ing to 24.1 and rebooting did see a crash (twice). I don't know if it can see the crash from another bectl.
Quote from: franco on July 18, 2024, 04:55:45 PM
I haven't seen any crash report with the particular stack trace today matching any of _2, _5 or _7 so far. Also no crash on my main production box.
Cheers,
Franco
I've sent two, one on _5 and one on _7. Have no idea if they made it to you since there is no feedback after sending. I did wait until the wan was up before submitting (since 24.7 the pppoe connection takes a few minutes to come up after reboot)
Edit: just realised you are meaning there is nothing matching this specific crash.
Quote from: almodovaris on July 18, 2024, 05:12:59 PM
Yup, 24.7 did not notice the crash. But bectl-ing to 24.1 and rebooting did see a crash (twice). I don't know if it can see the crash from another bectl.
Not sure about 24.1? We were trying to find the regression between 24.7.b and 24.7.r1 kernel so 24.1.x kernels are very far way from this (FreeBSD 13 vs. 14).
Cheers,
Franco
Quote from: csutcliff on July 18, 2024, 06:19:57 PM
Edit: just realised you are meaning there is nothing matching this specific crash.
Yes, just keep sending if you see one and I'll recheck later. The latest test kernel is
# opnsense-update -zkr 24.7.r1_7
Which may help with two other panics seen before on the 24.7.b kernels.
Cheers,
Franco
Sorry that I was not able to test the kernels today, but now I'm back with kernel 24.7.r1_7...
No panic after reboot; let's see how it is performing.
Regarding the NIC topic:
I'm running on a Intel N100 with Intel I226 NICs.
Quote from: franco on July 18, 2024, 07:22:08 PM
Yes, just keep sending if you see one and I'll recheck later. The latest test kernel is
If 24.1 can see the crash from 24.7, then both crashes are from 24.7. But, again, I don't know if it can report the crashes from another bectl.
Hmm, ok but that makes searching for these hard because I'm pre-filtering for 24.7 user agent string.
Only noticed r1_7 about 75 minutes ago, applied on the 3 FWs and working fine so far from a crashing perspective
Reported by icnl at home dot nl.
The bectl with 24.7 crashed twice. The bectl with 24.1 filled the crash reports. AFAIK 24.1 did not crash, ever. It's a fairly new installation (two days old).
But, okay, it can have misleading data about the installed software.
just sent another crash report for 24.7.r1_7
And, yup, if the bectl with 24.1 cannot see the crash from another bectl, I have no idea why it prompted me to send the crash reports.
Quote from: csutcliff on July 18, 2024, 11:34:19 PM
just sent another crash report for 24.7.r1_7
Just to make sure we're on the same page here, there can be crashes in programs that you can report from the GUI, restart said program and everything else on the FW continues working normally. The OPNsense team receives the crash reports and the issue is fixed one way or another and available shortly in an update.
This thread is about kernel panics on 24.1.r1 and the OS being rebooted automatically -- of which I had none for the last few kernels I tested on 3 FWs.
Uptime on 24.7.r1_7 is now over 10 hours.
Yes there were a few PHP crash reports as well and we fixed them where we could.
It looks like the pf state unlink stuff works fine now but there's still a strange panic so we will go ahead with RC2 and I've also uploaded a debug kernel to work on the remaining panic because there's nothing that sticks out in the code about this one (ip6_input() related).
Cheers,
Franco
Installed RC2. Lobby: Dashboard is blank no info at all.
Here is a screenshot.
Other than that, all the rest is functioning properly.
I wouldn't call it blank but I also wouldn't want to start discussing this in a kernel panic thread. Thanks.
So now we're at RC2. I'm hoping someone running into the ip6_input() panic will try the associated debug kernel to be able to share a core dump.
Only use the command if you are sure about your panic:
# opnsense-update -kr dbg-24.7.r2
The debug kernel will be detected by the reboot and configure itself to produce a core dump instead of a text dump. The dump files will not submit due to their size, so putting them on a file share would be the best option for us to grab it.
Thanks,
Franco
Quote from: newsense on July 19, 2024, 07:44:31 AM
Quote from: csutcliff on July 18, 2024, 11:34:19 PM
just sent another crash report for 24.7.r1_7
Just to make sure we're on the same page here, there can be crashes in programs that you can report from the GUI, restart said program and everything else on the FW continues working normally. The OPNsense team receives the crash reports and the issue is fixed one way or another and available shortly in an update.
This thread is about kernel panics on 24.1.r1 and the OS being rebooted automatically -- of which I had none for the last few kernels I tested on 3 FWs.
Uptime on 24.7.r1_7 is now over 10 hours.
Yes I'm talking about kernel crashes where it dumps page after page of debug into the screen and reboots, I'm submitting the reports it has generated after reboot which do include kernel dump info etc. had 2 kernel crashes with _7 yesterday but none so far today.
csutcliff, you are our only hope!
You're a prime candidate for the RC2 debug kernel since you have that non-obvious ip6_input() crash.
Cheers,
Franco
Help us, Obi-Franco Kenobi :)
How about that weekend ear worm? https://www.youtube.com/watch?v=AYMlad4e3Q4
I have a bad feeling about this ...
Quote from: franco on July 19, 2024, 02:35:33 PM
csutcliff, you are our only hope!
You're a prime candidate for the RC2 debug kernel since you have that non-obvious ip6_input() crash.
Cheers,
Franco
Thank you, I wasn't sure if I was "the one" since I didn't save the output from the crashes, only submitted it to you
I'm already on the rc2 but I'll install that debug kernel now.
Thanks. Don't forget to reboot before waiting for the crash. :)
Cheers,
Franco
Just happened on RC2. Fresh install via image and restored config.
Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 05
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80ddaf27
stack pointer = 0x28:0xfffffe00e334fbe0
frame pointer = 0x28:0xfffffe00e334fd10
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (swi1: netisr 5)
rdi: fffff801c527a300 rsi: fffff80236ff5b00 rdx: fffff8042e3e2800
Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80ddaf27
stack pointer = 0x28:0xfffffe00e334abe0
frame pointer = 0x28:0xfffffe00e334ad10
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (swi1: netisr 6)
rdi: fffff801c527a300 rsi: fffff8023b6af040 rdx: fffff803b289a000
rcx: fffffe00b5d0f240 r8: 000000000000006b r9: 3232395231a4ebd5
rax: 0000000000000000 rbx: fffff80001a73740 rbp: fffffe00e334ad10
r10: fffff80001a73740 r11: fffffe00e334a570 r12: fffff801c5621782
r13: fffff801c562179a r14: fffffe00e334abfc r15: fffff80017874800
trap number = 12
panic: page fault
cpuid = 6
time = 1721416041
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e334a8d0
vpanic() at vpanic+0x131/frame 0xfffffe00e334aa00
panic() at panic+0x43/frame 0xfffffe00e334aa60
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00e334aac0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00e334ab10
calltrap() at calltrap+0x8/frame 0xfffffe00e334ab10
--- trap 0xc, rip = 0xffffffff80ddaf27, rsp = 0xfffffe00e334abe0, rbp = 0xfffffe00e334ad10 ---
ip6_forward() at ip6_forward+0x2a7/frame 0xfffffe00e334ad10
ip6_input() at ip6_input+0x11f/frame 0xfffffe00e334adf0
swi_net() at swi_net+0x138/frame 0xfffffe00e334ae60
ithread_loop() at ithread_loop+0x257/frame 0xfffffe00e334aef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00e334af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e334af30
--- trap 0x4d3efdb8, rip = 0xaba227f72fb510cb, rsp = 0x53c53af5127eeb0e, rbp = 0xec1658a0e86e8d54 ---
KDB: enter: panic
panic.txt0600001214646534551 7146 ustarrootwheelpage faultversion.txt0600007514646534551 7552 ustarrootwheelFreeBSD 14.1-RELEASE-p2 stable/24.7-n267755-f257b8d7e144 SMP
i have an IPV6 rule that was forwarding to a remote IPV6 address, as soon as i disabled that rule it seems to have stopped crashing.
@danderson Looking for a core dump using the debug kernel for this if you can help out as well.
Thanks,
Franco
@franco
ok, i installed the debug kernel and then rebooted, then enabled my ipv6 forward rule and made it crash. i see /var/crash/kernel.0:
File too big to process. It will not be submitted automatically. in the crash report, what file(s) do you want me to grab?
Ill put them on my onedrive and shoot you a link at franco@opnsense.org if i remember correctly.
Yes, email is correct. Only need the kernel.0, splendid! :)
email with link sent.
had my first crash with rc2 (debug kernel), submitted the report and put a link to the kernel.0 in the notes.
10+ hours uptime on r2 here, on all FWs.
It was pretty late, kernel.0 is the wrong file... need the vmcore.0 instead. Sorry.
Cheers,
Franco
@franco
vmcore.0 file shared via link in your inbox now.
So danderson's report was about https://github.com/opnsense/src/commit/9cb6d71f6a
There maybe one more, but it would be easier to base work on this on a new kernel build on Monday which incorporates the above commit.
Cheers,
Franco
@danderson @csutcliff and anybody else who would like to help:
# opnsense-update -zkr 24.7.r2_2
Cheers,
Franco
Quote from: franco on July 22, 2024, 09:11:53 AM
@danderson @csutcliff and anybody else who would like to help:
# opnsense-update -zkr 24.7.r2_2
Cheers,
Franco
Even if this is not much of worth (as the crashes were so far on Baremetal only), I am running this on a VM OPNsense. So far all good.
I have it running on 2 FWs, so far so good.
Can't test on the others as I lost access there due to a Zerotier issue that seems to have been introduced in RC1/RC2 - sent you an email about it.
@franco
updated kernel and rebooted, did the same steps previously done to cause a crash and no crash this time. I'll keep running this kernel for the day unless you want us to try the 24.7.r2_3 kernel.
Quote from: franco on July 22, 2024, 09:11:53 AM
@danderson @csutcliff and anybody else who would like to help:
# opnsense-update -zkr 24.7.r2_2
Cheers,
Franco
Yay, thanks. The _3 is just for OpenVPN DCO.
Cheers,
Franco