Fatal Trap 12: page fault while in kernel mode (Kernel Panic)

Started by DocHodges, December 17, 2024, 03:55:16 PM

Previous topic - Next topic
All,
I have been running OPNSense on a VM for around 4 years or so. The machine is a Lenovo M80S using an intel X550-T2 nic. On the same machine as the OPNSense I also run home assistant and Unifi Network. I've tested all the hardware and it seems all the hardware is fine. No other VMs have an issue or have any indications of problems.

All that said I recently had my internet go out throughout the entire neighborhood. When it came back on my OPNSense began crashing every 10 minutes or so. Nothing in the VM settings have changed at all. I decided I would spin up a fresh install using default ZFS settings and the same issues persist. I even went as far as loading up a fresh install of pfsense and the same occurs with it. I am not sure what to check next. Does anyone have any history with an issue like this? I've reviewed the forum and see others having somewhat similar issues but theirs seem to do it once a week not every 10 minutes rendering the entire network effectively down. Below is the crash logs that I get from OPNSense. Maybe someone can use them to guide me to the answer. That would be greatly appreciated as I now have a very angry wife that can only watch her shows on her phone or hotspot!


 

---<<BOOT>>---
Copyright (c) 1992-2023 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
   The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.1-RELEASE-p6 stable/24.7-n267981-8375762712f SMP amd64
FreeBSD clang version 18.1.5 (https://github.com/llvm/llvm-project.git llvmorg-18.1.5-0-g617a15a9eac9)
VT(vga): text 80x25
CPU: Intel(R) Core(TM) i5-10500 CPU @ 3.10GHz (3096.10-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0xa0653  Family=0x6  Model=0xa5  Stepping=3
  Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT>
  Features2=0xfffab223<SSE3,PCLMULQDQ,VMX,SSSE3,FMA,CX16,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0x9c47ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT>
  Structured Extended Features2=0x4<UMIP>
  Structured Extended Features3=0xbc000400<MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0x2006b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO>
  AMD Extended Feature Extensions ID EBX=0x100d000<IBPB,IBRS,STIBP,SSBD>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 8589934592 (8192 MB)
avail memory = 8242401280 (7860 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPC    >
FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs
FreeBSD/SMP: 1 package(s) x 3 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23
Launching APs: 2 1
random: entropy device external interface
wlan: mac acl policy registered
kbd1 at kbdmux0
WARNING: Device "spkr" is Giant locked and may be deleted before FreeBSD 15.0.
vtvga0: <VT VGA driver>
kvmclock0: <KVM paravirtual clock>
Timecounter "kvmclock" frequency 1000000000 Hz quality 975
kvmclock0: registered as a time-of-day clock, resolution 0.000001s
smbios0: <System Management BIOS> at iomem 0xf5260-0xf527e
smbios0: Version: 2.8, BCD Revision: 2.8
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
acpi0: <BOCHS BXPC>
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 100000000 Hz quality 950
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x5040-0x505f mem 0xf4000000-0xf7ffffff,0xf8000000-0xfbffffff,0xfd214000-0xfd215fff irq 21 at device 1.0 on pci0
vgapci0: Boot video device
uhci0: <Intel 82801I (ICH9) USB controller> port 0x5060-0x507f irq 16 at device 26.0 on pci0
usbus0 on uhci0
usbus0: 12Mbps Full Speed USB v1.0
uhci1: <Intel 82801I (ICH9) USB controller> port 0x5080-0x509f irq 17 at device 26.1 on pci0
usbus1 on uhci1
usbus1: 12Mbps Full Speed USB v1.0
uhci2: <Intel 82801I (ICH9) USB controller> port 0x50a0-0x50bf irq 18 at device 26.2 on pci0
usbus2 on uhci2
usbus2: 12Mbps Full Speed USB v1.0
ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfd216000-0xfd216fff irq 19 at device 26.7 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci0
usbus3: 480Mbps High Speed USB v2.0
hdac0: <Intel 82801I HDA Controller> mem 0xfd210000-0xfd213fff irq 16 at device 27.0 on pci0
pcib1: <ACPI PCI-PCI bridge> mem 0xfd217000-0xfd217fff irq 16 at device 28.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> mem 0xfd218000-0xfd218fff irq 16 at device 28.1 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> mem 0xfd219000-0xfd219fff irq 16 at device 28.2 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> mem 0xfd21a000-0xfd21afff irq 16 at device 28.3 on pci0
pci4: <ACPI PCI bus> on pcib4
uhci3: <Intel 82801I (ICH9) USB controller> port 0x50c0-0x50df irq 16 at device 29.0 on pci0
usbus4 on uhci3
usbus4: 12Mbps Full Speed USB v1.0
uhci4: <Intel 82801I (ICH9) USB controller> port 0x50e0-0x50ff irq 17 at device 29.1 on pci0
usbus5 on uhci4
usbus5: 12Mbps Full Speed USB v1.0
uhci5: <Intel 82801I (ICH9) USB controller> port 0x5100-0x511f irq 18 at device 29.2 on pci0
usbus6 on uhci5
usbus6: 12Mbps Full Speed USB v1.0
ehci1: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfd21b000-0xfd21bfff irq 19 at device 29.7 on pci0
usbus7: EHCI version 1.0
usbus7 on ehci1
usbus7: 480Mbps High Speed USB v2.0
pcib5: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> mem 0xfc800000-0xfc8000ff irq 21 at device 1.0 on pci5
pci6: <ACPI PCI bus> on pcib6
virtio_pci0: <VirtIO PCI (legacy) Balloon adapter> port 0x4000-0x403f mem 0x7030800000-0x7030803fff irq 20 at device 3.0 on pci6
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci0
ahci0: <Intel ICH9 AHCI SATA controller> port 0x40c0-0x40df mem 0xfc700000-0xfc700fff irq 20 at device 7.0 on pci6
ahci0: AHCI v1.00 with 6 1.5Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
virtio_pci1: <VirtIO PCI (legacy) Console adapter> port 0x4040-0x407f mem 0xfc701000-0xfc701fff,0x7030804000-0x7030807fff irq 21 at device 8.0 on pci6
virtio_pci2: <VirtIO PCI (legacy) Console adapter> port 0x4080-0x40bf mem 0xfc702000-0xfc702fff,0x7030808000-0x703080bfff irq 22 at device 9.0 on pci6
ix0: <Intel(R) X550-T2> mem 0x7030000000-0x70303fffff,0x703080c000-0x703080ffff at device 16.0 on pci6
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 3 RX queues 3 TX queues
ix0: Using MSI-X interrupts with 4 vectors
ix0: allocated for 3 queues
ix0: allocated for 3 rx queues
ix0: Ethernet address: b4:96:91:ab:e2:dc
ix0: PCI Express Bus: Speed 8.0GT/s Width x4
ix0: fw 2.11.11 nvm 1.93.0 Option ROM V1-b1276-p0 eTrack 0x80000dd2
ix0: Error 6 setting up SR-IOV
ix0: netmap queues/slots: TX 3/2048, RX 3/2048
ix1: <Intel(R) X550-T2> mem 0x7030400000-0x70307fffff,0x7030810000-0x7030813fff at device 16.1 on pci6
ix1: Using 2048 TX descriptors and 2048 RX descriptors
ix1: Using 3 RX queues 3 TX queues
ix1: Using MSI-X interrupts with 4 vectors
ix1: allocated for 3 queues
ix1: allocated for 3 rx queues
ix1: Ethernet address: b4:96:91:ab:e2:dd
ix1: PCI Express Bus: Speed 8.0GT/s Width x4
ix1: fw 2.11.11 nvm 1.93.0 Option ROM V1-b1276-p0 eTrack 0x80000dd2
ix1: Error 6 setting up SR-IOV
ix1: netmap queues/slots: TX 3/2048, RX 3/2048
pcib7: <ACPI PCI-PCI bridge> mem 0xfc801000-0xfc8010ff irq 22 at device 2.0 on pci5
pci7: <ACPI PCI bus> on pcib7
pcib8: <ACPI PCI-PCI bridge> mem 0xfc802000-0xfc8020ff irq 23 at device 3.0 on pci5
pci8: <ACPI PCI bus> on pcib8
pcib9: <ACPI PCI-PCI bridge> mem 0xfc803000-0xfc8030ff irq 20 at device 4.0 on pci5
pci9: <ACPI PCI bus> on pcib9
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
ahci1: <Intel ICH9 AHCI SATA controller> port 0x5120-0x513f mem 0xfd21c000-0xfd21cfff irq 16 at device 31.2 on pci0
ahci1: AHCI v1.00 with 6 1.5Gbps ports, Port Multiplier not supported
ahcich6: <AHCI channel> at channel 0 on ahci1
ahcich7: <AHCI channel> at channel 1 on ahci1
ahcich8: <AHCI channel> at channel 2 on ahci1
ahcich9: <AHCI channel> at channel 3 on ahci1
ahcich10: <AHCI channel> at channel 4 on ahci1
ahcich11: <AHCI channel> at channel 5 on ahci1
acpi_syscontainer0: <System Container> on acpi0
vmgenc0: <VM Generation Counter> on acpi0
acpi_syscontainer1: <System Container> port 0xcd8-0xce3 on acpi0
acpi_syscontainer2: <System Container> port 0x620-0x62f on acpi0
acpi_syscontainer3: <System Container> port 0xcc0-0xcd7 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 15.0.
psm0: model IntelliMouse Explorer, device ID 4
orm0: <ISA Option ROMs> at iomem 0xca000-0xcafff,0xcb000-0xcbfff,0xe7800-0xeffff pnpid ORM0000 on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff pnpid PNP0900 on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounters tick every 10.000 msec
ugen5.1: <Intel UHCI root HUB> at usbus5
ugen3.1: <Intel EHCI root HUB> at usbus3
ugen1.1: <Intel UHCI root HUB> at usbus1
uhub0 on usbus5
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
ugen6.1: <Intel UHCI root HUB> at usbus6
uhub1 on usbus3
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
uhub2 on usbus1
uhub3 on usbus6
uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
ugen0.1: <Intel UHCI root HUB> at usbus0
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen4.1: <Intel UHCI root HUB> at usbus4
uhub4 on usbus0
uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen7.1: <Intel EHCI root HUB> at usbus7
uhub5 on usbus7
uhub5: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7
uhub6 on usbus4
uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen2.1: <Intel UHCI root HUB> at usbus2
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
uhub7 on usbus2
uhub7: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
Trying to mount root from zfs:zroot/ROOT/default []...
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <QEMU HARDDISK 2.5+> ATA-7 SATA device
ada0: Serial Number QM00013
ada0: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 32768MB (67108864 512 byte sectors)
cd0 at ahcich7 bus 0 scbus7 target 0 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00003
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: 2081MB (1065774 2048 byte sectors)
uhub0: 2 ports with 2 removable, self powered
uhub3: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub4: 2 ports with 2 removable, self powered
uhub6: 2 ports with 2 removable, self powered
uhub7: 2 ports with 2 removable, self powered
Root mount waiting for: usbus3 usbus7
Root mount waiting for: usbus3 usbus7
uhub1: 6 ports with 6 removable, self powered
uhub5: 6 ports with 6 removable, self powered
vtcon0: <VirtIO Console Adapter> on virtio_pci1
vtcon1: <VirtIO Console Adapter> on virtio_pci2
ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x700-0x73f irq 16 at device 31.3 on pci0
smbus0: <System Management Bus> on ichsmb0
lo0: link state changed to UP
pflog0: permanently promiscuous mode enabled
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix0: link state changed to UP
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x18
fault code      = supervisor read data, page not present
instruction pointer   = 0x20:0xffffffff80c52ac2
stack pointer           = 0x28:0xfffffe00107e4950
frame pointer           = 0x28:0xfffffe00107e4990
code segment      = base 0x0, limit 0xfffff, type 0x1b
         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process      = 0 (if_io_tqg_2)
rdi: fffff801456ed034 rsi: 0000000000000000 rdx: 0000000000000003
rcx: 000000002b768000  r8: 0000000000000004  r9: 00000000ab04b3f4
rax: 0000000000000000 rbx: fffff80149b70000 rbp: fffffe00107e4990
r10: 0000000000000023 r11: fffff80149b70000 r12: 0000000000000000
r13: 0000000000000028 r14: 0000000000000028 r15: 0000000000000014
trap number      = 12
panic: page fault
cpuid = 2
time = 1734402265
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00107e4640
vpanic() at vpanic+0x131/frame 0xfffffe00107e4770
panic() at panic+0x43/frame 0xfffffe00107e47d0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00107e4830
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00107e4880
calltrap() at calltrap+0x8/frame 0xfffffe00107e4880
--- trap 0xc, rip = 0xffffffff80c52ac2, rsp = 0xfffffe00107e4950, rbp = 0xfffffe00107e4990 ---
m_apply() at m_apply+0x92/frame 0xfffffe00107e4990
in_cksum_skip() at in_cksum_skip+0x2f/frame 0xfffffe00107e49b0
tcp_input_with_port() at tcp_input_with_port+0x27a/frame 0xfffffe00107e4b10
tcp_input() at tcp_input+0xb/frame 0xfffffe00107e4b20
ip_input() at ip_input+0x268/frame 0xfffffe00107e4b80
netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe00107e4bd0
ether_demux() at ether_demux+0x149/frame 0xfffffe00107e4c00
ether_nh_input() at ether_nh_input+0x36a/frame 0xfffffe00107e4c60
netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe00107e4cb0
ether_input() at ether_input+0x56/frame 0xfffffe00107e4d00
iflib_rxeof() at iflib_rxeof+0xc0e/frame 0xfffffe00107e4e00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00107e4e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x14e/frame 0xfffffe00107e4ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe00107e4ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00107e4f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00107e4f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Update: I have completed several memtest and everything seems fine with the ram. I also see there is talks about changing settings with the Nic card as some cards default to a 0 or a 1. Is this the experience they have that requires this? Is the X550 one of the affected?

Update 2: I am seeing more and more threads about kernel crashing. Ive been reading them but it seems that there are issues with the debug kernel as well. Does anyone know the correct fix for this so I can give it a try and see if thats indeed what my issues is related to?

I haven't seen this panic before but it's pretty nasty judging by the stack trace.  Would you be willing to run a debug kernel to collect a core dump to share with us to inspect this?


Thanks,
Franco

Quote from: franco on December 17, 2024, 07:16:56 PMI haven't seen this panic before but it's pretty nasty judging by the stack trace.  Would you be willing to run a debug kernel to collect a core dump to share with us to inspect this?


Thanks,
Franco

Yeah absolutely. Whatever I can do to help understand/support I am happy to do. Whats odd is the timing associated with it.

It could be a device doing this, some specific TCP access. Stranger things have happened.

You can install the debug kernel with this command:

# opnsense-update -zkr dbg-24.7.10_2

Then just reboot and it will auto-configure to drop "vmcore.X" files into /var/crash

I need one of those (they can be a few hundred MB) to look at what's going on.

To go back to the stock kernel use:

# opnsense-update -k

(and reboot to activate)


Thanks,
Franco

Quote from: franco on December 17, 2024, 07:52:53 PMIt could be a device doing this, some specific TCP access. Stranger things have happened.

You can install the debug kernel with this command:

# opnsense-update -zkr dbg-24.7.10_2

Then just reboot and it will auto-configure to drop "vmcore.X" files into /var/crash

I need one of those (they can be a few hundred MB) to look at what's going on.

To go back to the stock kernel use:

# opnsense-update -k

(and reboot to activate)


Thanks,
Franco

I will load this when I get home and upload the results

Quote from: franco on December 17, 2024, 07:52:53 PMIt could be a device doing this, some specific TCP access. Stranger things have happened.

You can install the debug kernel with this command:

# opnsense-update -zkr dbg-24.7.10_2

Then just reboot and it will auto-configure to drop "vmcore.X" files into /var/crash

I need one of those (they can be a few hundred MB) to look at what's going on.

To go back to the stock kernel use:

# opnsense-update -k

(and reboot to activate)


Thanks,
Franco


Quote from: franco on December 17, 2024, 07:52:53 PMIt could be a device doing this, some specific TCP access. Stranger things have happened.

You can install the debug kernel with this command:

# opnsense-update -zkr dbg-24.7.10_2

Then just reboot and it will auto-configure to drop "vmcore.X" files into /var/crash

I need one of those (they can be a few hundred MB) to look at what's going on.

To go back to the stock kernel use:

# opnsense-update -k

(and reboot to activate)


Thanks,
Franco

I just sent you a PM with the link to the files. Please let me know if that works.

What's up with that back and forth right before the fault"
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
...
IP conflict?
No clue if this is relevant...

Quote from: EricPerl on December 18, 2024, 01:42:00 AMWhat's up with that back and forth right before the fault"
arp: 192.168.1.120 moved from 4c:17:44:72:e9:96 to 08:3a:88:4f:38:92 on ix1
arp: 192.168.1.120 moved from 08:3a:88:4f:38:92 to 4c:17:44:72:e9:96 on ix1
...
IP conflict?
No clue if this is relevant...


Thanks for pointing that out. I was able to identify both devices that were trying to grab the same IP. Both are dynamic so not sure why they insisted on being the same but I forced one to static and hopefully at least solve that particular issue. Unfortunately after solving that the main issue remains.

Yeah the arp issue is just noise, good to have fixed but not relevant here.

So I continued to have issues no matter what I did so I decided to take drastic measures. I completely nuked the entire setup. I replaced the m.2 drive and put Proxmox on its own SSD drive. Reinstalled all VMs as well as repasted the X550-T2 and put a fan on it. While I was inside the PC I went ahead and repasted the CPU also for good practices. After getting everything back up and running I ran through the night without any crashes. This doesn't help pinpoint what happened but it did eliminate the issues so far. Just wanted to provide a quick update. Thanks for those that reached out to help.

I'm still suspecting some change related to ixgbe(4) driver changes, but not much to go on if the situation resolved itself now.


Cheers,
Franco

January 09, 2025, 09:21:19 PM #13 Last Edit: January 09, 2025, 09:23:22 PM by borys.ohnsorge
Hi,

I also struggle with kernel panic on a backup machine in a cluster running as virtual machines on opnestack. In my case, the problems started after updating to 24.7.10, as far as I remember.

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x0
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff80f3c00f
stack pointer           = 0x28:0xfffffe000edf1d10
frame pointer           = 0x28:0xfffffe000edf1d50
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (thread taskq)
rdi: fffffe008e859400 rsi: 0000000000000000 rdx: 000000000000002e
rcx: 0000000000000000  r8: 0000000000000000  r9: fffff80005bbe480
rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe000edf1d50
r10: fffff80005bbe480 r11: 00000000800a7d8e r12: fffff80156d6cfe0
r13: fffffe008e859400 r14: fffff80156d6ccb8 r15: fffff80005bbe540
trap number             = 12
panic: page fault
cpuid = 2
time = 1736260351
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000edf1a00
vpanic() at vpanic+0x131/frame 0xfffffe000edf1b30
panic() at panic+0x43/frame 0xfffffe000edf1b90
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe000edf1bf0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe000edf1c40
calltrap() at calltrap+0x8/frame 0xfffffe000edf1c40
--- trap 0xc, rip = 0xffffffff80f3c00f, rsp = 0xfffffe000edf1d10, rbp = 0xfffffe000edf1d50 ---
zone_release() at zone_release+0x1df/frame 0xfffffe000edf1d50
bucket_drain() at bucket_drain+0xb9/frame 0xfffffe000edf1d80
bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x2ff/frame 0xfffffe000edf1de0
zone_timeout() at zone_timeout+0x2eb/frame 0xfffffe000edf1e20
uma_timeout() at uma_timeout+0x58/frame 0xfffffe000edf1e40
taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe000edf1ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe000edf1ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe000edf1f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000edf1f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x0
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff82785e61
stack pointer           = 0x28:0xfffffe0084263a40
frame pointer           = 0x28:0xfffffe0084263a70
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq28: virtio_pci2)
rdi: fffff801245bc210 rsi: fffff801245bc210 rdx: 000000002ce7b27e
rcx: 0000000000000000  r8: 000000004150a7d2  r9: 0000000020510000
rax: 0000000000000000 rbx: fffff80018710b00 rbp: fffffe0084263a70
r10: 000000002c28d619 r11: 0000000000000301 r12: fffffe008ea5c000
r13: 000000000005625c r14: fffff801245bc210 r15: fffff80003aea000
trap number             = 12
panic: page fault
cpuid = 1
time = 1736353882
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0084263730
vpanic() at vpanic+0x131/frame 0xfffffe0084263860
panic() at panic+0x43/frame 0xfffffe00842638c0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0084263920
trap_pfault() at trap_pfault+0x46/frame 0xfffffe0084263970
calltrap() at calltrap+0x8/frame 0xfffffe0084263970
--- trap 0xc, rip = 0xffffffff82785e61, rsp = 0xfffffe0084263a40, rbp = 0xfffffe0084263a70 ---
pf_detach_state() at pf_detach_state+0x6c1/frame 0xfffffe0084263a70
pf_unlink_state() at pf_unlink_state+0x290/frame 0xfffffe0084263ab0
pfsync_in_del_c() at pfsync_in_del_c+0x6c/frame 0xfffffe0084263af0
pfsync_input() at pfsync_input+0x23a/frame 0xfffffe0084263b70
ip_input() at ip_input+0x268/frame 0xfffffe0084263bd0
netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe0084263c20
ether_demux() at ether_demux+0x149/frame 0xfffffe0084263c50
ether_nh_input() at ether_nh_input+0x36a/frame 0xfffffe0084263cb0
netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe0084263d00
ether_input() at ether_input+0x56/frame 0xfffffe0084263d50
vtnet_rxq_eof() at vtnet_rxq_eof+0x6e9/frame 0xfffffe0084263e20
vtnet_rx_vq_process() at vtnet_rx_vq_process+0xbc/frame 0xfffffe0084263e60
ithread_loop() at ithread_loop+0x257/frame 0xfffffe0084263ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe0084263f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0084263f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

The strange thing is that the master is in exactly the same version and there are no problems with it. Of course, it is running in a different location on a different compute node (but with the same parameters for both virtual machines and compute nodes).

Regards,
Borys

@DocHodges can You show output from:
uname -a
Look at this thread: [SOLVED] Kernel Panic - box restarts every few hours

And @dedi #4 post

Regards
Borys