[SOLVED] Update to 24.7.2 results in kernel panic

Started by mroess, August 22, 2024, 08:42:18 AM

Previous topic - Next topic
August 22, 2024, 08:42:18 AM Last Edit: August 29, 2024, 05:21:35 PM by mroess
Hello community,

I have a problem with my opnsense after updating from 24.7.1 to 24.7.2. After the reboot  I get a kernel panic:

Mounting filesystems...


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff804d7de7
stack pointer           = 0x28:0xfffffe00715ddb20
frame pointer           = 0x28:0xfffffe00715ddb40
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 31 (zpool)
rdi: fffff8000378c000 rsi: 0000000000020005 rdx: 000000000000000b
rcx: fffff80003768900  r8: 0000000000000001  r9: 0000000000000000
rax: 0000000000000000 rbx: fffff8000378c000 rbp: fffffe00715ddb40
r10: 0000000000000016 r11: fffff8004ff73520 r12: 0000000000002000
r13: 0000000000020005 r14: fffff8000378b700 r15: fffff8000378b600
trap number             = 12
panic: page fault
cpuid = 0
time = 1009843225
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00715dd810
vpanic() at vpanic+0x131/frame 0xfffffe00715dd940
panic() at panic+0x43/frame 0xfffffe00715dd9a0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00715dda00
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00715dda50
calltrap() at calltrap+0x8/frame 0xfffffe00715dda50
--- trap 0xc, rip = 0xffffffff804d7de7, rsp = 0xfffffe00715ddb20, rbp = 0xfffffe00715ddb40 ---
agp_close() at agp_close+0x57/frame 0xfffffe00715ddb40
giant_close() at giant_close+0x68/frame 0xfffffe00715ddb90
devfs_close() at devfs_close+0x4b3/frame 0xfffffe00715ddc00
VOP_CLOSE_APV() at VOP_CLOSE_APV+0x1d/frame 0xfffffe00715ddc20
vn_close1() at vn_close1+0x14c/frame 0xfffffe00715ddc90
vn_closefile() at vn_closefile+0x3d/frame 0xfffffe00715ddce0
devfs_close_f() at devfs_close_f+0x2a/frame 0xfffffe00715ddd10
_fdrop() at _fdrop+0x11/frame 0xfffffe00715ddd30
closef() at closef+0x24a/frame 0xfffffe00715dddc0
closefp_impl() at closefp_impl+0x58/frame 0xfffffe00715dde00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe00715ddf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00715ddf30
--- syscall (6, FreeBSD ELF64, close), rip = 0x18d8eaaf52ba, rsp = 0x18d8f2980d88, rbp = 0x18d8f2980da0 ---
KDB: enter: panic
[ thread pid 31 tid 100264 ]
Stopped at      kdb_enter+0x33: movq    $0,0xfd9962(%rip)

A clean reinstall of 24.7 with a config backup works, but after updating to 24.7.2 again, the kernel panic shows up again.

I have tried a different memory module without any success.

Any ideas what I can do?

Kind Regards

Marian


No Surricata, no ZenArmor and a Hardware machine. It is an old Gateprotect gpo 150

Seeing the crashing process is "zpool" here is an educated guess:

https://github.com/opnsense/core/commit/37003d1d5793b03

That's going to be a fun one... I suspect if you install UFS it's fine.


Cheers,
Franco

QuoteSeeing the crashing process is "zpool" here is an educated guess:

https://github.com/opnsense/core/commit/37003d1d5793b03

That's going to be a fun one... I suspect if you install UFS it's fine.

Hi Franco.

is this from your perspective a generic problem which hits all ZFS based installations and a recommendation to skip 24.7.2 in that case?

Best regards
Robert

Hi Robert,

I hope not. It looks like a fringe kernel issue with the OP's hardware (AGP slot in particular) that doesn't surface on FreeBSD because ZPOOL_IMPORT_PATH wasn't bootstrapped ever since FreeBSD changed ZFS implementations in version 13 so this will likely remain to go unnoticed.

We do have a debug kernel, but it requires the system to boot up first. If we can manage to get a core dump we can probably apply a bandaid and report to FreeBSD.

That being said I see no reason to revoke the ZPOOL_IMPORT_PATH. All hell would have broken loose already if it was a major problem. But even then I still don't think an environment variable should crash a user system ever.


Cheers,
Franco

Ok, thanks. Than  I will collect some more courage in the next days and do the upgrade afterwards. :-)


August 22, 2024, 04:30:23 PM #8 Last Edit: August 22, 2024, 05:17:23 PM by TestUserPleaseIgnore
I am having the same error, what information do you need?

My box is/was a BARRACUDA BMF220A.

I dont think this device has an AGP port?

Quote from: franco on August 22, 2024, 08:55:44 AM
I suspect if you install UFS it's fine.
Cheers,
Franco

So reinstalling OPNSense with UFS filesystem is the fix?

August 22, 2024, 05:05:57 PM #10 Last Edit: August 22, 2024, 05:07:53 PM by doktornotor
Quote from: TestUserPleaseIgnore on August 22, 2024, 04:30:23 PM
My box is/was a BARRACUDA BBS190A.

I dont think this device has an AGP port?

Well, it probably has some on-board graphics which presents as AGP to the system.

https://github.com/freebsd/freebsd-src/blob/main/sys/dev/agp/agp.c#L829
https://man.freebsd.org/cgi/man.cgi?query=agp&sektion=4&format=html

I think even the $14 on Ebay is too much for this kind of HW.
https://www.msi.com/Motherboard/N3150I-ECO/

For a good LMAO, see this video: https://www.youtube.com/watch?v=BKDRnu7KAKw - this things looks like a serious fire hazard to me and another WTF from Barracuda.


Quote from: rackenthogg on August 22, 2024, 05:05:13 PM
So reinstalling OPNSense with UFS filesystem is the fix?

If it's the above hardware, I'd reinstall it into a shredder.

I've got the hardware that I've got, if you want to donate a new box to me I'd gladly take it.

August 22, 2024, 05:10:14 PM #12 Last Edit: August 22, 2024, 05:14:24 PM by rackenthogg
Quote from: doktornotor on August 22, 2024, 05:05:57 PM
Quote from: rackenthogg on August 22, 2024, 05:05:13 PM
So reinstalling OPNSense with UFS filesystem is the fix?

If it's the above hardware, I'd reinstall it into a shredder.

Well, it is definitely not.


Quote from: doktornotor on August 22, 2024, 05:05:57 PM
Quote from: TestUserPleaseIgnore on August 22, 2024, 04:30:23 PM
My box is/was a BARRACUDA BBS190A.

I dont think this device has an AGP port?

Well, it probably has some on-board graphics which presents as AGP to the system.

https://github.com/freebsd/freebsd-src/blob/main/sys/dev/agp/agp.c#L829
https://man.freebsd.org/cgi/man.cgi?query=agp&sektion=4&format=html

I think even the $14 on Ebay is too much for this kind of HW.
https://www.msi.com/Motherboard/N3150I-ECO/

For a good LMAO, see this video: https://www.youtube.com/watch?v=BKDRnu7KAKw - this things looks like a serious fire hazard to me and another WTF from Barracuda.


Quote from: rackenthogg on August 22, 2024, 05:05:13 PM
So reinstalling OPNSense with UFS filesystem is the fix?

If it's the above hardware, I'd reinstall it into a shredder.

Sorry I got the SKU wrong - its a BMF220a

https://servers4less.com/bmf220a-barracuda-im-firewall-220-1-x-vga-1-x-keyboard-1-x-10-100base-tx/?srsltid=AfmBOooz7IMXsHzVaYZvXeeg0x6_VUo4LlyN17G3oL46h4fffM9uepXL

August 22, 2024, 05:24:18 PM #14 Last Edit: August 22, 2024, 06:09:41 PM by rackenthogg
Update: I wiped disk and booted OPNsense 27.2 from USB. Then I logged in as installer and selected UFS option. After that screen was bombarded with fast scrolling messages (to fast to read anything) and the box rebooted.

Repeated the same, after selecting UFS install I paused the screen messages using "Pause" key but before I focused my camera on display, the box rebooted anyway.

Edit 2: ZFS install mode with config restore proceeds without problems, but after 24.7.2 update the whole kernel crash happens again. Selecting UFS install mode results in stream of error messages shown below.
Edit: I managed to take a quick paparazzo-style photo, so here is the part of fast-scrolling stream of messages: