Fatal trap 12: page fault while in kernel mode - Please help?

Started by magnust, May 10, 2022, 02:04:08 PM

Previous topic - Next topic
Every five days or so visitors can't reach my site. They get a message in their browser that there are too many redirects. My very uneducated guess this has to do with some problem with Haproxy running on my OPNsense as a https to http proxy for two sites.

After these events I always have the message in the OPNsense dashboard that a problem occurred and I send this in as a report. These reports always end with this below, although some numbers are slightly different, for example the cpuid differs between the reports.

Due to lack of knowledge I have no clue where to begin, does it have to do with the network card drivers, is it a Hyper-V incompatibility issue, is there anything I can try turning off or on to see if it makes any difference? It's getting quite problematic since I need to be on standby 24/7 to be able to restart OPNsense when this happens.




Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x18
fault code      = supervisor read data, page not present
instruction pointer   = 0x20:0xffffffff80d37b72
stack pointer           = 0x28:0xfffffe0061f8b650
frame pointer           = 0x28:0xfffffe0061f8b6c0
code segment      = base 0x0, limit 0xfffff, type 0x1b
         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process      = 0 (hvevent3)
trap number      = 12
panic: page fault
cpuid = 3
time = 1652011369
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0061f8b410
vpanic() at vpanic+0x17f/frame 0xfffffe0061f8b460
panic() at panic+0x43/frame 0xfffffe0061f8b4c0
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0061f8b520
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0061f8b580
calltrap() at calltrap+0x8/frame 0xfffffe0061f8b580
--- trap 0xc, rip = 0xffffffff80d37b72, rsp = 0xfffffe0061f8b650, rbp = 0xfffffe0061f8b6c0 ---
m_copydata() at m_copydata+0xf2/frame 0xfffffe0061f8b6c0
tcp_output() at tcp_output+0x1339/frame 0xfffffe0061f8b8a0
tcp_do_segment() at tcp_do_segment+0x2b54/frame 0xfffffe0061f8b980
tcp_input_with_port() at tcp_input_with_port+0xafb/frame 0xfffffe0061f8bae0
tcp_input() at tcp_input+0xb/frame 0xfffffe0061f8baf0
ip_input() at ip_input+0x15f/frame 0xfffffe0061f8bb80
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe0061f8bbd0
ether_demux() at ether_demux+0x138/frame 0xfffffe0061f8bc00
ether_nh_input() at ether_nh_input+0x355/frame 0xfffffe0061f8bc60
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe0061f8bcb0
ether_input() at ether_input+0x69/frame 0xfffffe0061f8bd10
hn_chan_callback() at hn_chan_callback+0xa8e/frame 0xfffffe0061f8be10
vmbus_chan_task() at vmbus_chan_task+0x26/frame 0xfffffe0061f8be40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe0061f8bec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0061f8bef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0061f8bf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0061f8bf30
--- trap 0, rip = 0xffffffff80c2b91f, rsp = 0, rbp = 0x3000000020 ---
mi_startup() at mi_startup+0xdf/frame 0x3000000020
KDB: enter: panic
panic.txt0600001214235730551  7136 ustarrootwheelpage faultversion.txt0600007014235730551  7535 ustarrootwheelFreeBSD 13.0-STABLE stable/22.1-n248071-cafeb6ce414 SMP




Some more stuff from the report


Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-STABLE stable/22.1-n248071-cafeb6ce414 SMP amd64
FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
SRAT: Ignoring memory at addr 0x108200000
SRAT: Ignoring memory at addr 0x1000000000
SRAT: Ignoring memory at addr 0x10000200000
SRAT: Ignoring memory at addr 0x20000200000
SRAT: Ignoring memory at addr 0x40000200000
SRAT: Ignoring memory at addr 0x80000200000
VT(efifb): resolution 1024x768
Hyper-V Version: 10.0.14393 [SP5]
  Features=0x2e7f
  PM Features=0x0 [C2]
  Features3=0xed7b2
Timecounter "Hyper-V" frequency 10000000 Hz quality 2000
CPU: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (3192.00-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x906ea  Family=0x6  Model=0x9e  Stepping=10
  Features=0x1f83fbff
  Features2=0xfeda3203
  AMD Features=0x2c100800
  AMD Features2=0x121
  Structured Extended Features=0x9c6fb9
  Structured Extended Features3=0x9c000400
  XSAVE Features=0xb
Hypervisor: Origin = "Microsoft Hv"
real memory  = 4294967296 (4096 MB)
avail memory = 4124368896 (3933 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table:
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 cache groups x 1 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0  irqs 0-23
Launching APs: 1 2 3
wlan: mac acl policy registered
Timecounter "Hyper-V-TSC" frequency 10000000 Hz quality 3000
random: entropy device external interface
kbd0 at kbdmux0
WARNING: Device "spkr" is Giant locked and may be deleted before FreeBSD 14.0.
efirtc0:
efirtc0: registered as a time-of-day clock, resolution 1.000000s
aesni0:
acpi0:
cpu0:  on acpi0
atrtc0:  port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
acpi_syscontainer0:  on acpi0
vmbus0:  on acpi_syscontainer0
vmgenc0:  on acpi0
vmbus_res0:  irq 5 on acpi0
Timecounters tick every 10.000 msec
usb_needs_explore_all: no devclass
vmbus0: version 4.0
hvet0:  on vmbus0
Event timer "Hyper-V" frequency 10000000 Hz quality 1000
hvkbd0:  on vmbus0
hvheartbeat0:  on vmbus0
hvkvp0:  on vmbus0
hvshutdown0:  on vmbus0
hvtimesync0:  on vmbus0
hvtimesync0: RTT
hvvss0:  on vmbus0
storvsc0:  on vmbus0
hn0:  on vmbus0
<6>hn0: Ethernet address: 00:15:5d:0c:87:5b
hn1:  on vmbus0
<6>hn0: link state changed to UP
<6>hn1: Ethernet address: 00:15:5d:0c:87:5d
hn2:  on vmbus0
<6>hn1: link state changed to UP
<6>hn2: Ethernet address: 00:15:5d:0c:87:60
hn3:  on vmbus0
<6>hn2: link state changed to UP
<6>hn3: Ethernet address: 00:15:5d:0c:87:65
hn4:
<6>hn3: link state changed to UP
on vmbus0
<6>hn4: Ethernet address: 00:15:5d:0c:87:66
<6>hn4: link state changed to UP
hn5:  on vmbus0
<6>hn5: Ethernet address: 00:15:5d:0c:87:67
hn6:  on vmbus0
<6>hn5: link state changed to UP
<6>hn6: Ethernet address: 00:15:5d:0c:87:68
hn7:  on vmbus0
<6>hn6: link state changed to UP
<6>hn7: Ethernet address: 00:15:5d:0c:87:69
<6>hn7: link state changed to UP
cd0 at storvsc0 bus 0 scbus0 target 0 lun 1
cd0:  Removable CD-ROM SPC-3 SCSI device
cd0: 300.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
da0 at storvsc0 bus 0 scbus0 target 0 lun 0
da0:  Fixed Direct Access SPC-3 SCSI device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 130048MB (266338304 512 byte sectors)
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
<118>Mounting filesystems...
<118>tunefs: soft updates remains unchanged as enabled
<118>tunefs: file system reloaded
<118>camcontrol: ATA ATA_IDENTIFY via pass_16 failed
<118>camcontrol: ATA ATAPI_IDENTIFY via pass_16 failed
<118>** /dev/gpt/rootfs
<118>FILE SYSTEM CLEAN; SKIPPING CHECKS
<118>clean, 27525317 free (6021 frags, 3439912 blocks, 0.0% fragmentation)
<118>Setting hostuuid: b5410ab1-97b9-b14f-8750-c28bb0967f51.
<118>Setting hostid: 0x69cca029.
<118>Configuring vt: keymap blanktime.
<118>Configuring crash dump device: /dev/gpt/swapfs
<118>swapon: adding /dev/gpt/swapfs as swap device
<118>.ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg /usr/local/lib/ipsec /usr/local/lib/perl5/5.32/mach/CORE
<118>32-bit compatibility ldconfig path:
<118>done.
<118>>>> Invoking early script 'upgrade'
<118>>>> Invoking early script 'configd'
<118>Starting configd.
<118>>>> Invoking early script 'templates'
<118>Generating configuration: OK
<118>>>> Invoking early script 'backup'
<118>>>> Invoking backup script 'captiveportal'
<118>>>> Invoking backup script 'dhcpleases'
<118>>>> Invoking backup script 'duid'
<118>>>> Invoking backup script 'netflow'
<118>>>> Invoking backup script 'rrd'
<118>>>> Invoking early script 'carp'
<118>CARP event system: OK
<118>Launching the init system...done.
<118>Initializing...........done.
<118>Starting device manager...done.
<118>Configuring login behaviour...done.
<118>Configuring loopback interface...
<6>lo0: link state changed to UP
<118>done.
<118>Configuring kernel modules...done.
<118>Setting up extended sysctls...done.
<118>Setting timezone...done.
<118>Writing firmware setting...done.
<118>Writing trust files...done.

Remember this from here but same poster apparently. ;)

https://forum.opnsense.org/index.php?topic=27211.0

13.1 for 22.7 is almost ready. Maybe you can give it a try if you can snapshot:

# opnsense-update -bkzr 22.7.pre3
# yes | opnsense-shell reboot


Cheers,
Franco

Yeah, it's me again  ;D

The thing is that turning off IPS that was discussed there did help, the many reboots stopped.

In a way this is worse since I need to manually reboot to get it working again.


Will definitely give this a try! Snapshots is a nice thing  :)

Some are concluding it might also be related to Intel drivers... although I still doubt this since this looks like host-side territory.

In any case would be happy to hear how the 22.7 snapshot will turn out. :)


Cheers,
Franco


I'm also experiencing crashes and my trace looks pretty similar (but it's not the same):


KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01334652c0
vpanic() at vpanic+0x17f/frame 0xfffffe0133465310
panic() at panic+0x43/frame 0xfffffe0133465370
trap_fatal() at trap_fatal+0x385/frame 0xfffffe01334653d0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0133465430
calltrap() at calltrap+0x8/frame 0xfffffe0133465430
--- trap 0xc, rip = 0xffffffff80d37acd, rsp = 0xfffffe0133465500, rbp = 0xfffffe0133465570 ---
m_copydata() at m_copydata+0x4d/frame 0xfffffe0133465570
tcp_output() at tcp_output+0x1339/frame 0xfffffe0133465750
tcp_do_segment() at tcp_do_segment+0x2cd5/frame 0xfffffe0133465830
tcp_input_with_port() at tcp_input_with_port+0xafb/frame 0xfffffe0133465990
tcp_input() at tcp_input+0xb/frame 0xfffffe01334659a0
ip_input() at ip_input+0x15f/frame 0xfffffe0133465a30
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe0133465a80
ether_demux() at ether_demux+0x138/frame 0xfffffe0133465ab0
ether_nh_input() at ether_nh_input+0x355/frame 0xfffffe0133465b10
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe0133465b60
ether_input() at ether_input+0x69/frame 0xfffffe0133465bc0
ether_demux() at ether_demux+0x121/frame 0xfffffe0133465bf0
ether_nh_input() at ether_nh_input+0x355/frame 0xfffffe0133465c50
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe0133465ca0
ether_input() at ether_input+0x69/frame 0xfffffe0133465d00
iflib_rxeof() at iflib_rxeof+0xc27/frame 0xfffffe0133465e00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0133465e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe0133465ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe0133465ef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0133465f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0133465f30
--- trap 0, rip = 0xffffffff80c2b91f, rsp = 0, rbp = 0x6 ---
mi_startup() at mi_startup+0xdf/frame 0x6


@magnust, is your issue fixed in the 22.7 snapshot?

frankie, it looks similar indeed. However, I do think it's a new issue on FreeBSD 13.


Cheers,
Franco

So far:

(a bit confusing)


- No error so far has popped up in the dashboard that "a problem has occurred" to be sent in as a report

- I got the "too many redirects" error once. Fixed by restarting haproxy


I seem to have the same issue, getting  Fatal trap 12: page fault while in kernel mode as well.

No IDS, no heavy services just OpenVPN.  I'd be willing to try the 22.7 snapshot too if it seems to make it go away.

https://forum.opnsense.org/index.php?topic=28422.0


IIRC there was a way to only install the kernel of the 22.7 pre-release, not the whole thing... but can't remember the details.



That's nice to hear. Though it means there is a patch we could incorporate into 22.1. I'll try to take a look.


Cheers,
Franco


Quote from: franco on May 23, 2022, 09:23:23 AM
Could be https://github.com/opnsense/src/commit/15d6a1f03ba79 -- looks sane enough to include in 22.1.x anyway.

I see this patch is added to git branch sandbox/22.1 and the 22.7.b tag recently.

https://github.com/opnsense/src/commit/469123a60d1a743c7bf48d91191ac493e3af1cd5

Does that mean it will make it's way to the normal 22.1.7_ updates (or 22.1.x) in the near future?

Since the move to FreeBSD in 22.1 is there any preference to base OPNSense releases on -RELEASE over -STABLE or does it not matter?

Sorry if the question is elementary I'm new to OPNSense.