OPNsense Forum

Archive => 22.1 Legacy Series => Topic started by: dpeter on May 19, 2022, 12:44:37 pm

Title: Crash since 22.1.7_1 Need some Help Identifying Cause
Post by: dpeter on May 19, 2022, 12:44:37 pm
Hello, long-time lurker.

I am running OPNSense 22.1.7_1 on a brand new Protectli FW2B and I'm getting a crash every 12-24 hours or so (panic string: page fault).

I'm only running an OpenVPN client which I've associated with my LAN network and it works great.

System Info
Code: [Select]
FreeBSD 13.0-STABLE stable/22.1-n248071-cafeb6ce414 SMP amd64
OPNsense 22.1.7_1 3cc3877c1
Plugins os-ddclient-1.5 os-dmidecode-1.1_1 os-nut-1.8.1 os-theme-cicada-1.29 os-theme-rebellion-1.8.8
Time Thu, 19 May 2022 10:23:02 +0000
OpenSSL 1.1.1o  3 May 2022
PHP 7.4.29

I get the following crash dumps:

Code: [Select]
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010c11e40
vpanic() at vpanic+0x17f/frame 0xfffffe0010c11e90
panic() at panic+0x43/frame 0xfffffe0010c11ef0
mca_intr() at mca_intr+0xbb/frame 0xfffffe0010c11f20
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0010c11f20
--- trap 0x1c, rip = 0xffffffff8120c430, rsp = 0xfffffe000f5abd48, rbp = 0xfffffe000f5abd70 ---
native_lapic_eoi() at native_lapic_eoi/frame 0xfffffe000f5abd70
Xtimerint() at Xtimerint+0xb1/frame 0xfffffe000f5abd70
--- interrupt, rip = 0xffffffff8110d8e5, rsp = 0xfffffe000f5abe40, rbp = 0xfffffe000f5abe50 ---
cpu_idle() at cpu_idle+0xe5/frame 0xfffffe000f5abe50
sched_idletd() at sched_idletd+0x4e1/frame 0xfffffe000f5abef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000f5abf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000f5abf30
--- trap 0xf618070, rip = 0xffffffff80c2b91f, rsp = 0, rbp = 0xffffffff8131d1ea ---
mi_startup() at mi_startup+0xdf/frame 0xffffffff8131d1ea
KDB: enter: panic
---<>---

Code: [Select]
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 04
fault virtual address = 0xffffffff80cf6250
fault code = supervisor read instruction, page not present
instruction pointer = 0x20:0xffffffff80cf6250
stack pointer         = 0x28:0xfffffe0010c11df8
frame pointer         = 0x28:0xfffffe0010c11e30
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 11 (idle: cpu1)
trap number = 12
panic: page fault
cpuid = 1
time = 1652910869
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010c11bb0
vpanic() at vpanic+0x17f/frame 0xfffffe0010c11c00
panic() at panic+0x43/frame 0xfffffe0010c11c60
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0010c11cc0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0010c11d20
calltrap() at calltrap+0x8/frame 0xfffffe0010c11d20
--- trap 0xc, rip = 0xffffffff80cf6250, rsp = 0xfffffe0010c11df8, rbp = 0xfffffe0010c11e30 ---
printf() at printf/frame 0xfffffe0010c11e30
mca_scan() at mca_scan+0x4f6/frame 0xfffffe0010c11ef0
mca_intr() at mca_intr+0x39/frame 0xfffffe0010c11f20
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0010c11f20
--- trap 0x1c, rip = 0xffffffff80c2eae9, rsp = 0xfffffe000f5abda0, rbp = 0xfffffe000f5abdc0 ---
statclock() at statclock+0x159/frame 0xfffffe000f5abdc0
handleevents() at handleevents+0xf3/frame 0xfffffe000f5abe00
cpu_activeclock() at cpu_activeclock+0x70/frame 0xfffffe000f5abe30
cpu_idle() at cpu_idle+0xa8/frame 0xfffffe000f5abe50
sched_idletd() at sched_idletd+0x4e1/frame 0xfffffe000f5abef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000f5abf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000f5abf30
--- trap 0xf618070, rip = 0xffffffff80c2b91f, rsp = 0, rbp = 0xffffffff8131d1ea ---
mi_startup() at mi_startup+0xdf/frame 0xffffffff8131d1ea
KDB: enter: panic

Attached are the full crash files.

Does this look like a hardware issue?  This unit is a few weeks old.
Title: Re: Crash since 22.1.7_1 Need some Help Identifying Cause
Post by: dpeter on May 19, 2022, 01:28:35 pm
Here's just the crashlog (ddb) and msgbuf.

The only hardware modification I made to the Protectli FW2B is adding a mini PCI wifi card (I have it disabled and only use it on-demand).  I had it in another Haswell-based mini-PC running OPNSense 21.7 without any issues before upgrading to this new FW2B.

Code: [Select]
ath0@pci0:3:0:0: class=0x028000 rev=0x01 hdr=0x00 vendor=0x168c device=0x002a subvendor=0x1a32 subdevice=0x0306
vendor = ‘Qualcomm Atheros’
device = ‘AR928X Wireless Network Adapter (PCI-Express)’
class = network

Code: [Select]
ath0_wlan1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 00:17:c4:a3:c6:81
groups: wlan
ssid “” channel 1 (2412 MHz 11b)
regdomain 101 indoor ecm authmode OPEN privacy OFF txpower 20
scanvalid 60 wme burst dtimperiod 1 -dfs bintval 0
parent interface: ath0
media: IEEE 802.11 Wireless Ethernet autoselect <hostap> (autoselect <hostap>)

I do see interface errors when it's enabled in the counters and the logs do have the following error but researching this on FreeBSD forums and other places it doesn't seem to be fatal and it does work.  I turn it on and use it rarely and the crashes have occurred when the interface is disabled in the UI.

(lots of these in crash logs)
Code: [Select]
ath0: stuck beacon; resetting (bmiss count 4)
ath0: stuck beacon; resetting (bmiss count 4)
Title: Re: Crash since 22.1.7_1 Need some Help Identifying Cause
Post by: dpeter on May 20, 2022, 09:42:31 pm
It looks like I am hitting the same issue as: https://forum.opnsense.org/index.php?topic=28302.0

Folks in that thread are trying to apply the 22.7pre3 kernel based on FreeBSD 13.1 kernel via:

Code: [Select]
# opnsense-update -bkzr 22.7.pre3
# yes | opnsense-shell reboot

Before I do this, last night I removed the mini-PCI Qualcomm Atheros wifi card that I had added after I received my new Protectli FW2B as advised by Protectli support to simplify my hardware setup so I'll see how that goes.

Trawling through the FreeBSD Bugzilla this error could be a lot of things but there is at least a few yet-unsolved and recent bug report for similar crashes like this one:  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256816

I don't know how to analyze the dumps however to understand and debug what might be the cause.

I'll report back here if I have any more crashes, if so I'll proceed with trying the 22.7pre3 / FBSD 13.1 kernel as suggested in the other thread.
Title: Re: [FIXED] Crash since 22.1.7_1 Need some Help Identifying Cause
Post by: dpeter on May 28, 2022, 06:14:04 pm
Closing this out with a note, I've been stable for two weeks running since I removed the Qualcomm atheros wifi card.

It may or may not be related also (and perhaps my issue was transient enough to not notice it) but I also upgraded to 22.1.8_1 + 22.7b kernel+base (FreeBSD 13.1-RELEASE).

Marking this thread as fixed now.


Still getting these kinds of errors, now running 13-1-RELEASE ala 22.7.b beta and I've downgraded to 22.1.6 opnsense.

Code: [Select]
MCA: Bank 2, Status 0xb6000000000a010a
MCA: Global Cap 0x0000000000000806, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x406c4, APIC ID 4
MCA: CPU 1 UNCOR EN PCC GCACHE L2 ERR error
MCA: Address 0xe50e020
panic: Unrecoverable machine check exception
cpuid = 1
time = 1653803162
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010c11e40
vpanic() at vpanic+0x17f/frame 0xfffffe0010c11e90
panic() at panic+0x43/frame 0xfffffe0010c11ef0
mca_intr() at mca_intr+0xbb/frame 0xfffffe0010c11f20
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0010c11f20
--- trap 0x1c, rip = 0x8003eb189, rsp = 0x7fffffffe560, rbp = 0x7fffffffe570 ---
KDB: enter: panic
Title: Re: Crash since 22.1.7_1 Need some Help Identifying Cause
Post by: RNHurt on February 28, 2023, 09:43:23 am
@dpeter, were you able to get this problem resolved?  I'm experiencing similar kernel panics and am looking for some help.

https://forum.opnsense.org/index.php?topic=32728.0