PSA: PF regression in 24.7.10 kernel and fix

Started by newsense, December 04, 2024, 12:51:12 AM

Previous topic - Next topic
FYI - there is a pf regression in the 24.7.10 kernel that causes OPNsense to crash and reboot -- twice for me on two different FWs in the first hour.


I've been working on this with Franco and there's a test kernel that is stable for me for 3.5+ hours and counting.

https://github.com/opnsense/src/commit/83757627



# opnsense-update -zkr 24.7.10-state

# opnsense-shell reboot

December 04, 2024, 02:00:21 AM #1 Last Edit: December 04, 2024, 04:33:38 AM by craig_
I have had the same issues. Three panics in one hour. I have implemented the fix as suggested.

Update: 2.5 hours later, all is well in the world again.

Potential similar issues here as well. I had my system crash twice today after upgrading to 24.7.10_1 when normally my system is very stable.

Running the test kernel here as well to see if that clears up my crashes too.

For reference here's one of the stack traces courtesy of newsense:

db:0:kdb.enter.default>  show pcpu
cpuid        = 2
dynamic pcpu = 0xfffffe0090458080
curthread    = 0xfffff800019ad000: pid 7 tid 100104 critnest 1 "pf purge"
curpcb       = 0xfffff800019ad520
fpcurthread  = none
idlethread   = 0xfffff800016d0740: tid 100005 "idle: cpu2"
self         = 0xffffffff83a12000
curpmap      = 0xffffffff81b81670
tssp         = 0xffffffff83a12384
rsp0         = 0xfffffe00853e9000
kcr3         = 0x74480000
ucr3         = 0xffffffffffffffff
scr3         = 0x2286f0000
gs32p        = 0xffffffff83a12404
ldt          = 0xffffffff83a12444
tss          = 0xffffffff83a12434
curvnet      = 0xfffff800011afd80
db:0:kdb.enter.default>  bt
Tracing pid 7 tid 100104 td 0xfffff800019ad000
kdb_enter() at kdb_enter+0x33/frame 0xfffffe00853e8c20
panic() at panic+0x43/frame 0xfffffe00853e8c80
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00853e8ce0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00853e8d30
calltrap() at calltrap+0x8/frame 0xfffffe00853e8d30
--- trap 0xc, rip = 0xffffffff82197d9c, rsp = 0xfffffe00853e8e00, rbp = 0xfffffe00853e8e30 ---
pf_detach_state() at pf_detach_state+0x5fc/frame 0xfffffe00853e8e30
pf_unlink_state() at pf_unlink_state+0x290/frame 0xfffffe00853e8e70
pf_purge_expired_states() at pf_purge_expired_states+0x188/frame 0xfffffe00853e8ec0
pf_purge_thread() at pf_purge_thread+0x13b/frame 0xfffffe00853e8ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00853e8f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00853e8f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

This'll explain why I didn't have any network this morning. I've applied the patch, will reboot once the chance presents itself (or the system crashes, whichever comes first).

Make sure to check the kernel version and act accordingly:

Please note we had to hotfix the kernel which will not reinstall
automatically if you caught the bad version.  If you experience
panics on 24.7.10 relating to pf(4) please reinstall from the GUI
(which includes an automatic reboot) or run "opnsense-update -fk"
from the shell followed by a manual reboot.  The correct kernel
identifies itself as "stable/24.7-n267981-8375762712f" using
"uname -v".

Someone brave enough to help get a vmcore for debugging purposes / reporting upstream?

# opnsense-update -zkr dbg-24.7.10

just reboot, let it crash. vmcore will be under /var/crash/vmcore.X -- move back to safe kernel

# opnsense-update -k

I'm making an effort to report this in an acceptable upstream manner.


Thanks,
Franco

Thanks a lot for the fix!

I thought I was going crazy this morning when I got no Internet anymore... :)

I have also the same issues. "opnsense-update -zkr 24.7.10-state" in shell fixed it, but after that i saw the available hotfix 24.7.10_2. After the update in the GUI i had 2 panics again. After running "opnsense-update -zkr 24.7.10-state" again, it seems stable.

uname -v shows FreeBSD 14.1-RELEASE-p6 route_del_fix-n267981-8375762712f SMP

Quote from: franco on December 04, 2024, 12:12:56 PM
Make sure to check the kernel version and act accordingly:

Please note we had to hotfix the kernel which will not reinstall
automatically if you caught the bad version.  If you experience
panics on 24.7.10 relating to pf(4) please reinstall from the GUI
(which includes an automatic reboot) or run "opnsense-update -fk"
from the shell followed by a manual reboot.  The correct kernel
identifies itself as "stable/24.7-n267981-8375762712f" using
"uname -v".

martin87?????? I mean I said it here and everywhere else. Your kernel availability will be subject to the mirror you are using. I cannot speak for anything but the default mirror.

December 04, 2024, 04:35:28 PM #10 Last Edit: December 04, 2024, 04:44:40 PM by FullyBorked
I had a hard crash and reboot this morning too.  How do we know this is the cause is there a particular log or detail to reference?

Check your kernel with "uname -v" and/or post the crash report here if your box offers it. Just the "--- trap" enclosed things.


Cheers,
Franco

--- trap 0x9, rip = 0xffffffff8277de0d, rsp = 0xfffffe00c498be00, rbp = 0xfffffe00c498be30 ---
pf_detach_state() at pf_detach_state+0x66d/frame 0xfffffe00c498be30
pf_unlink_state() at pf_unlink_state+0x290/frame 0xfffffe00c498be70
pf_purge_expired_states() at pf_purge_expired_states+0x188/frame 0xfffffe00c498bec0
pf_purge_thread() at pf_purge_thread+0x13b/frame 0xfffffe00c498bef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00c498bf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c498bf30
--- trap 0xc8e6f912, rip = 0x233a94f6b6accd0a, rsp = 0x373380ffa2a5d903, rbp = 0x72eb97b310c02134 ---


FreeBSD 14.1-RELEASE-p6 stable/24.7-n267979-0d692990122 SMP


Just learned how to view the crash report, guess crashes don't happen enough for me to even know how :)

But based on that log def looks like the same issue. 

December 04, 2024, 04:55:50 PM #13 Last Edit: December 04, 2024, 04:58:04 PM by FullyBorked
I'm on the default mirror but running "opnsense-update -fk" produces the same kernel I pasted above instead of the desired kernel Franco posted.

I'm dumb I need to reboot before checking kernel version...  ::)