[SOLVED] Update to 24.7.2 results in kernel panic

Started by mroess, August 22, 2024, 08:42:18 AM

Previous topic - Next topic

Ok it seems the disable for agp(4) is viable although some doubts WRT VGA capability loss remain at least for me. What's strange is that cheap hardware vendor buys chipsets with AGP support for building serial appliances, but it was probably the cheapest option. ;)

According to Wikipedia "As of 2013, PCI Express has replaced AGP as the default interface for graphics cards on new systems.". I think that's the benchmark we have to apply for this wager. I'll put this on the agenda for today's developer meeting.

@mroess I don't have any immediate means to have you upload that file. Do you have a dropbox or online drive or something where you could put it?


Cheers,
Franco

August 26, 2024, 10:08:41 PM #107 Last Edit: August 27, 2024, 07:07:10 PM by pscrev20
Hello all, sorry I'm a bit late to the party.  Had same error as OP. Read entire thread to help troubleshoot.  Tried the "set hint.agp.0.disabled=1" and got unknown variable error.  I'm running Opnsense on a Checkpoint 2200 (more than one to be exact) and all have had this issue.  Neither of these have VGA capabilities.  Also have one running on a Vsphere VM on a Dell R330 and has the same issue.  Attached is the info I could pull when booting from a console as a "single user" and from troubleshooting on the Checkpoint. 

Ultimate fix for me:


  • Download new Serial image from website and install using ZFS (same as before)
  • Restore configs and access router via SSH and web GUI
  • Run opnsense-update -zkr 24.7.2 from shell
  • Reboot and run update from web GUI

System is back up and so far is stable.

*UPDATE*
So, my VM did not go down, only the 2 CheckPoint Serial devices.  So, it does only affect the devices that are serial only.  Issue with my VM is a remote administration issue, something else to fix for me..... :-[

Kernel panic and attempt to disable agp (I did try many different variations of this, not just the one below)

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff804d7de7
stack pointer           = 0x28:0xfffffe007b1e3b20
frame pointer           = 0x28:0xfffffe007b1e3b40
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 30 (zpool)
rdi: fffff800035d9400 rsi: 0000000000020005 rdx: 000000000000000b
rcx: fffff80003797900  r8: 0000000000000001  r9: 0000000000000000
rax: 0000000000000000 rbx: fffff800035d9400 rbp: fffffe007b1e3b40
r10: 0000000000000016 r11: fffff80005d7ec60 r12: 0000000000002000
r13: 0000000000020005 r14: fffff80003761900 r15: fffff80003761a00
trap number             = 12
panic: page fault
cpuid = 0
time = 1724695959
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe007b1e3810
vpanic() at vpanic+0x131/frame 0xfffffe007b1e3940
panic() at panic+0x43/frame 0xfffffe007b1e39a0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe007b1e3a00
trap_pfault() at trap_pfault+0x46/frame 0xfffffe007b1e3a50
calltrap() at calltrap+0x8/frame 0xfffffe007b1e3a50
--- trap 0xc, rip = 0xffffffff804d7de7, rsp = 0xfffffe007b1e3b20, rbp = 0xfffffe007b1e3b40 ---
agp_close() at agp_close+0x57/frame 0xfffffe007b1e3b40
giant_close() at giant_close+0x68/frame 0xfffffe007b1e3b90
devfs_close() at devfs_close+0x4b3/frame 0xfffffe007b1e3c00
VOP_CLOSE_APV() at VOP_CLOSE_APV+0x1d/frame 0xfffffe007b1e3c20
vn_close1() at vn_close1+0x14c/frame 0xfffffe007b1e3c90
vn_closefile() at vn_closefile+0x3d/frame 0xfffffe007b1e3ce0
devfs_close_f() at devfs_close_f+0x2a/frame 0xfffffe007b1e3d10
_fdrop() at _fdrop+0x11/frame 0xfffffe007b1e3d30
closef() at closef+0x24a/frame 0xfffffe007b1e3dc0
closefp_impl() at closefp_impl+0x58/frame 0xfffffe007b1e3e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe007b1e3f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe007b1e3f30
--- syscall (6, FreeBSD ELF64, close), rip = 0x2123944fb2ba, rsp = 0x2123994a4d88, rbp = 0x2123994a4da0 ---
KDB: enter: panic
[ thread pid 30 tid 100220 ]
Stopped at      kdb_enter+0x33: movq    $0,0xfd9962(%rip)
db> set hint.agp.0.disabled=1
Unknown variable
db>



Single User boot and trying to discover what was using agp
root@:/ # dmesg | grep agp
agp0: <Intel Pineview SVGA controller> on vgapci0
WARNING: Device "agp" is Giant locked and may be deleted before FreeBSD 15.0.
agp0: aperture size is 256M, detected 8188k stolen memory
agp0: <Intel Pineview SVGA controller> on vgapci0
WARNING: Device "agp" is Giant locked and may be deleted before FreeBSD 15.0.
agp0: aperture size is 256M, detected 8188k stolen memory
agp_close() at agp_close+0x57/frame 0xfffffe007b1efb40
agp0: <Intel Pineview SVGA controller> on vgapci0
WARNING: Device "agp" is Giant locked and may be deleted before FreeBSD 15.0.
agp0: aperture size is 256M, detected 8188k stolen memory

In today's meeting we agreed to go with the disabling of the agp device in the 24.7.3 kernel.

Thanks for a further datapoint that these devices appear to be serial ones without relevant VGA capabilities.


Cheers,
Franco


My console is working fine on the alternative pci device without the AGP driver.

Quote from: franco on August 26, 2024, 10:51:45 PM
In today's meeting we agreed to go with the disabling of the agp device in the 24.7.3 kernel.

Thanks for a further datapoint that these devices appear to be serial ones without relevant VGA capabilities.


Cheers,
Franco

Quote from: doktornotor on August 24, 2024, 11:28:01 AM

#metoo


set hint.agp.0.disabled=1
set hint.agp.1.disabled=1
set hint.agp.2.disabled=1
set hint.agp.3.disabled=1
boot


can get you running from the loader shell probably (on most setups, agp.0 should be enough).

Worked for me on my test (home) setup but with the added gotcha of having to access it blind, as for some reason the (very small cheap) monitor is blank during the boot menu stage (works before and after). Very careful one-finger typing got me there eventually  ;D

In case of power outage when I'm not available to restore it, is there a way to add these settings to apply every boot?

Mark from FreeBSD provided a patch so I built a test kernel with agp reenabled and the fix in place.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281035#c8

# opnsense-update -zkr 24.7.3-agp

Just in case to recover reboot into kernel.old from the boot menu (if possible).

Though it looks like the most likely fix at the moment. To confirm-confirm test if agp was already loaded when it boots ok when initial 24.7.2 would not:

# kldload agp

Should complain about already being loaded. :)

Though we will probably keep the driver disabled by default unless it creates other problems with 23.7.3. Only one way to find out either way.



Cheers,
Franco

....unless it creates other problems with [s][i][b]23[/b][/i][/s]24.7.3.

You lost one year while writing your post. Please don't take us back to the horrible 2023 with your time travel capacities...
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

The secret has been revealed.  ;)

Of course I mean 24.7.3.


Cheers,
Franco

sorry guys, I'm having a lot of reboots because of Kernel Panics. How can I know if it's related to this AGP issue?

Fatal trap 9: general protection fault while in kernel mode
cpuid = 15; apic id = 2e
instruction pointer = 0x20:0xffffffff810924ee
stack pointer         = 0x28:0xfffffe014b0a4bd0
frame pointer         = 0x28:0xfffffe014b0a4c00
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 36814 (python3.11)
rdi: fffffe001ea22b00 rsi: 000000000000000f rdx: 00000000000000ed
rcx: 2d8be74f1d661a99  r8: 000007fffffff000  r9: fffff800019c6868
rax: fffff801a04247b0 rbx: fffffe00070c2a08 rbp: fffffe014b0a4c00
r10: 0000000115915425 r11: fffff80000000000 r12: fffffe001ea22b00
r13: 0000000000000000 r14: fffff801a04247a8 r15: fffffe014b0a4c60
trap number = 9
panic: general protection fault
cpuid = 15
time = 1724877840
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014b0a4910
vpanic() at vpanic+0x131/frame 0xfffffe014b0a4a40
panic() at panic+0x43/frame 0xfffffe014b0a4aa0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe014b0a4b00
calltrap() at calltrap+0x8/frame 0xfffffe014b0a4b00
--- trap 0x9, rip = 0xffffffff810924ee, rsp = 0xfffffe014b0a4bd0, rbp = 0xfffffe014b0a4c00 ---
pmap_try_insert_pv_entry() at pmap_try_insert_pv_entry+0xbe/frame 0xfffffe014b0a4c00
pmap_copy() at pmap_copy+0x549/frame 0xfffffe014b0a4cb0
vmspace_fork() at vmspace_fork+0xc90/frame 0xfffffe014b0a4d30
fork1() at fork1+0x52e/frame 0xfffffe014b0a4da0
sys_fork() at sys_fork+0x54/frame 0xfffffe014b0a4e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe014b0a4f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe014b0a4f30
--- syscall (0, FreeBSD ELF64, syscall), rip = 0x826ce51fa, rsp = 0x8204b87d8, rbp = 0x8204b8830 ---
KDB: enter: panic
panic.txt0600003014663706020  7135 ustarrootwheelgeneral protection faultversion.txt0600007414663706020  7540 ustarrootwheelFreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP


Trying to mount root from zfs:zroot/ROOT/default []...
uhub0: 4 ports with 4 removable, self powered
uhub1: 16 ports with 16 removable, self powered
Root mount waiting for: usbus1
ugen1.2: <MediaTek Inc. WirelessDevice> at usbus1
Dual Console: Serial Primary, Video Secondary
pid 31 (zpool) is attempting to use unsafe AIO requests - not logging anymore


root@opnsense:~ # dmesg | grep agp
root@opnsense:~ #


Not my intention to mix topics, so if this has nothing to do with the AGP thing, I will open a new topic.

That one is completely unrelated to agp.

Quote from: doktornotor on August 29, 2024, 12:08:19 AM
That one is completely unrelated to agp.

I will open a new topic to not mix things here then. thanks buddy!

I tried to follow this whole thread but found it a bit confusing.


For anyone who happens to be running PCengines APU2-series boards, all 3 of my APU2-based opnsense boxes, running ZFS,  applied the 24.7.2 update without any problems.  No kernel panics or other weirdness.




@furfix

I'm suspecting a bad zpool in your install that is now found since zpool import -Na does what it should.


Cheers,
Franco

Quote from: franco on August 29, 2024, 07:34:10 AM
I'm suspecting a bad zpool in your install that is now found since zpool import -Na does what it should.


What's the best way to check this and hopefully fixing it before attempting an upgrade?