[SOLVED] Update to 24.7.2 results in kernel panic

Started by mroess, August 22, 2024, 08:42:18 AM

Previous topic - Next topic
Hi,

I have a Watchguard XTM505. 3GB DDR2 using single SATA Samsung 870 EVO 256GB SSD.

This is happening on my hardware as well and no previous issues any other OPNsense versions.

Nothing additional added or configured in OPNsense after installation.

24.7 clean install works

24.7.1 worked

24.7.2 upgrade from 24 then same issue as OP.

Just wanted to add additional information. Screenshot attached as well

August 22, 2024, 05:26:47 PM #16 Last Edit: August 22, 2024, 05:30:29 PM by doktornotor
Quote from: TestUserPleaseIgnore on August 22, 2024, 05:18:23 PM
Sorry I got the SKU wrong - its a BMF220a

https://servers4less.com/bmf220a-barracuda-im-firewall-220-1-x-vga-1-x-keyboard-1-x-10-100base-tx/?srsltid=AfmBOooz7IMXsHzVaYZvXeeg0x6_VUo4LlyN17G3oL46h4fffM9uepXL

I don't dare to Google it.  ;D

Quote from: benkill15 on August 22, 2024, 05:25:36 PM
Just wanted to add additional information. Screenshot attached as well

Pasting the serial console output would be a whole lot better than the screenshot.

Not sure what to make of this. The defect we talk about with the OP is not in any image we offer.

The screenshot is out of context since it scrolls forever with irrelevant stack traces.


Cheers,
Franco

Mine was identical to the OPs, maybe one of us could upgrade again to 24.7.2 and try to get you logs?

Quote from: doktornotor on August 22, 2024, 05:26:47 PM

Quote from: benkill15 on August 22, 2024, 05:25:36 PM
Just wanted to add additional information. Screenshot attached as well

Pasting the serial console output would be a whole lot better than the screenshot.

Hi, understood and agreed. It was what I had at the time so will get home later tonight and post full output.

Same here on an HP/Compaq and a Toshiba disk.
It panics at the exact same instruction (the movq) as with OP. I have no additional info to offer.
At another installation, 24.7.2 is running fine after an earlier update, I was told.


Quote from: benkill15 on August 22, 2024, 05:25:36 PM

24.7 clean install works

24.7.1 worked

24.7.2 upgrade from 24 then same issue as OP.


I've tested the whole thing on another hardware box. Same kernel crash/panic thing happens after update to 24.7.2

Let's bring a bit of structure in these unclear +1 posts.

Are you using ZFS? How old is the hardware you are using or is it a VM? Does this panic occur due to the 24.7.2 kernel or 24.7.2 core package? I know it's difficult with the panic but we need more data points than "24.7.2 is not working" now.


Thanks,
Franco

Fix that worked for me:

1. Wipe disk (without this installer barked later about some UUIDs and other disk related stuff).
2. Start installer, during install select "Other Modes" menu option and manually create UFS filesystem.
3. After install restore config backup.
4. Update to 24.7.2 (I did it from shell, and missing plugins were installed, too)

What was weird is that I was not asked for reboot after updating to 24.7.2 and adding plugins.




bump: I had the same issue. took photos of the logs (spoiler alert: they look about the same as everyone else's) but I'm not gonna include them unless asked because I don't think they'll be of much use.

tried reboots, legacy kernel, safe mode, etc. no dice. after a reinstall to 24.7 it all worked, but updating to 24.7.2 brought about the same issue. sticking on 24.7 for now. if there's any logs or sysinfo I can offer to help with this issue I'm happy to, but I don't have the time to try updating and tinkering again to help with bugfixing


Quote from: benkill15 on August 22, 2024, 05:39:14 PM
Quote from: doktornotor on August 22, 2024, 05:26:47 PM

Quote from: benkill15 on August 22, 2024, 05:25:36 PM
Just wanted to add additional information. Screenshot attached as well

Pasting the serial console output would be a whole lot better than the screenshot.

Hi, understood and agreed. It was what I had at the time so will get home later tonight and post full output.

Hardware: WatchGuard XTM505. Xeon L5420. 3GB DDR2. Samsung 870 EVO 256GB.

Posting in serial console output. This was a clean install of 24.7 using ZFS. Nothing configured other than for pure connectivity and tried to update to 24.7.2. I'll be happy to help further if I can.

Mounting filesystems...


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff804d7de7
stack pointer           = 0x28:0xfffffe00594aeb20
frame pointer           = 0x28:0xfffffe00594aeb40
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 29 (zpool)
rdi: fffff8000383b500 rsi: 0000000000020005 rdx: 000000000000000b
rcx: fffff800037ff780  r8: 0000000000000001  r9: 0000000000000000
rax: 0000000000000000 rbx: fffff8000383b500 rbp: fffffe00594aeb40
r10: 0000000000000016 r11: fffff80003817c60 r12: 0000000000002000
r13: 0000000000020005 r14: fffff8000383a700 r15: fffff8000383a600
trap number             = 12
panic: page fault
cpuid = 3
time = 1724369812
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00594ae810
vpanic() at vpanic+0x131/frame 0xfffffe00594ae940
panic() at panic+0x43/frame 0xfffffe00594ae9a0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00594aea00
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00594aea50
calltrap() at calltrap+0x8/frame 0xfffffe00594aea50
--- trap 0xc, rip = 0xffffffff804d7de7, rsp = 0xfffffe00594aeb20, rbp = 0xfffffe00594aeb40 ---
agp_close() at agp_close+0x57/frame 0xfffffe00594aeb40
giant_close() at giant_close+0x68/frame 0xfffffe00594aeb90
devfs_close() at devfs_close+0x4b3/frame 0xfffffe00594aec00
VOP_CLOSE_APV() at VOP_CLOSE_APV+0x1d/frame 0xfffffe00594aec20
vn_close1() at vn_close1+0x14c/frame 0xfffffe00594aec90
vn_closefile() at vn_closefile+0x3d/frame 0xfffffe00594aece0
devfs_close_f() at devfs_close_f+0x2a/frame 0xfffffe00594aed10
_fdrop() at _fdrop+0x11/frame 0xfffffe00594aed30
closef() at closef+0x24a/frame 0xfffffe00594aedc0
closefp_impl() at closefp_impl+0x58/frame 0xfffffe00594aee00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe00594aef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00594aef30
--- syscall (6, FreeBSD ELF64, close), rip = 0x1805a05842ba, rsp = 0x1805a7b73d88, rbp = 0x1805a7b73da0 ---
KDB: enter: panic
[ thread pid 29 tid 100230 ]
Stopped at      kdb_enter+0x33: movq    $0,0xfd9962(%rip)
db>


August 23, 2024, 09:22:37 AM #27 Last Edit: August 23, 2024, 02:13:38 PM by mifi42
Quote from: franco on August 22, 2024, 08:00:18 PM
Let's bring a bit of structure in these unclear +1 posts.

Are you using ZFS? How old is the hardware you are using or is it a VM? Does this panic occur due to the 24.7.2 kernel or 24.7.2 core package? I know it's difficult with the panic but we need more data points than "24.7.2 is not working" now.

ZFS: yes
HW: HP/Compaq dc7800
Age: unknown, but at least a couple of years, probably five or so.
VM: No.
Kernel or corepackage: I have no clue. The kernel panic stack is in zpool, so I would say there is certainly a problem in the kernel, probably in ZFS. See the dumps others have provided, mine is similar as far as I can tell.

The problem was reproducible and consistently fails with same output.
Glad to help, if I can.
Please let me know what I can do, but I am currently reinstalling, so I can no longer reproduce the error in the same config.

Edit 1:
Additional information: During the reboot from live and reïnstall, the config importer fails. When it started to read the previous ZFS on harddisk, it showed many many error kernel messages that scroll past so fast I cannot read them, and then reboots. I.e. even the former 24.7 kernel is not capable of reading the ZFS on my disk anymore.
I would say the ZFS on disk got inconsistent enough to be a total loss.

Edit 2:
Fresh install from live 24.7 with ZFS. Installing from scratch as the HD was wiped.
After the proper default, and running the wizzard from the GUI for the initial config, I tried upgrading from the root menu at the console.
The reulst is exactly the same. After the reboot the system crashes
"KDB: enter: panic
[thread pid 31 tid 100212]
Stopped at kdb_enter+0x33: movq $0,0xfd996..."

Tracing pid 31
panic() at panic+0x43/frame 0xfffffe007a3169a0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe007a326a00
trap_pfault() at trap_pfault+0x46/frame 0xfffffe007a316a50
calltrap() at calltrap+0x8/frame 0xfffffe007a316a50
--- trap 0xc, rip = 0xffffffff804d7de7, rsp = 0xfffffe007a316b20, rbp = 0xfffff007a316b40 ---
agp_close() at agp_close+0x57/frame 0xfffffe007a316b40
giant_close() ...
devfs_close() ...
VOP_CLOSE_APV() ...
vn_close1() ...
vn_closefile() ...
etc.

I've been thinking how to approach this. Would someone care to test two images of 24.7.2 -- one with the actual 24.7.2 state and one with the environment var commit reverted?

I think we should do 24.7.3 next week so we need to move this along. We need a way to confirm this precisely and I guess that is the safest way.


Cheers,
Franco

I can do that if you want. My system is out of order anyway...
You have two installation images prepared that I can install? I need .iso images because all I have to install from is a good ol' DVD player. My darn HP refuses to boot from USB sticks.
Links via PM, perhaps?

Quote from: franco on August 23, 2024, 01:20:33 PM
I've been thinking how to approach this. Would someone care to test two images of 24.7.2 -- one with the actual 24.7.2 state and one with the environment var commit reverted?

I think we should do 24.7.3 next week so we need to move this along. We need a way to confirm this precisely and I guess that is the safest way.


Cheers,
Franco