Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - dpeter

#1
I'm tracking some MCE issues and I want to rule out the culprit being FreeBSD 13.0-STABLE or 13.1-RELEASE (occurs on both).

What is the best way to downgrade from 22.1.x to 21.7.x or to 21.7.8_1 or latest in the 21.7 series that still has the older FreeBSD 12.x kernel.

I mainly want to go back to FreeBSD 12.x or the hardened BSD kernel that shipped with 21.7.x but reverting the packages and userland is fine too, my usage is very minimal just OpenVPN and basic NAT.

Can I use?

opnsense-update -kr 21.7

I have read the opnsense-tools man pages and docs but it's not entirely clear to me.

For any reference I'm trying to workaround this error, it may turn out being a hardware issue but I want to try the 12.x FreeBSD kernels first before replacing.

MCA: Bank 2, Status 0xb6000000000a010a
MCA: Global Cap 0x0000000000000806, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x406c4, APIC ID 4
MCA: CPU 1 UNCOR EN PCC GCACHE L2 ERR error
MCA: Address 0xe50e020
panic: Unrecoverable machine check exception
cpuid = 1
time = 1653803162
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010c11e40
vpanic() at vpanic+0x17f/frame 0xfffffe0010c11e90
panic() at panic+0x43/frame 0xfffffe0010c11ef0
mca_intr() at mca_intr+0xbb/frame 0xfffffe0010c11f20
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0010c11f20
--- trap 0x1c, rip = 0x8003eb189, rsp = 0x7fffffffe560, rbp = 0x7fffffffe570 ---
KDB: enter: panic


(older related thread trying to figure out what happened here:  https://forum.opnsense.org/index.php?topic=28422.msg137981#msg137981)
#2
Closing this out with a note, I've been stable for two weeks running since I removed the Qualcomm atheros wifi card.

It may or may not be related also (and perhaps my issue was transient enough to not notice it) but I also upgraded to 22.1.8_1 + 22.7b kernel+base (FreeBSD 13.1-RELEASE).

Marking this thread as fixed now.


Still getting these kinds of errors, now running 13-1-RELEASE ala 22.7.b beta and I've downgraded to 22.1.6 opnsense.

MCA: Bank 2, Status 0xb6000000000a010a
MCA: Global Cap 0x0000000000000806, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x406c4, APIC ID 4
MCA: CPU 1 UNCOR EN PCC GCACHE L2 ERR error
MCA: Address 0xe50e020
panic: Unrecoverable machine check exception
cpuid = 1
time = 1653803162
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010c11e40
vpanic() at vpanic+0x17f/frame 0xfffffe0010c11e90
panic() at panic+0x43/frame 0xfffffe0010c11ef0
mca_intr() at mca_intr+0xbb/frame 0xfffffe0010c11f20
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0010c11f20
--- trap 0x1c, rip = 0x8003eb189, rsp = 0x7fffffffe560, rbp = 0x7fffffffe570 ---
KDB: enter: panic

#3
Same here, 22.1.8_1 + 22.7b kernel+base and all is OK.

I am also stable since the panics I had earlier, though removing the Atheros wifi card seemed to help with that.

https://forum.opnsense.org/index.php?topic=28422
#4
Thank you for the fixes in 22.1.8 as well as providing an onramp to beta test 13.1-RELEASE.

What does the upgrade cycle look like if we go to 22.7.b for when 22.7 proper is released?  Is that just an opnsense-update jump away to get on the 22.7 series proper once released?
#5
Quote from: franco on May 23, 2022, 09:23:23 AM
Could be https://github.com/opnsense/src/commit/15d6a1f03ba79 -- looks sane enough to include in 22.1.x anyway.

I see this patch is added to git branch sandbox/22.1 and the 22.7.b tag recently.

https://github.com/opnsense/src/commit/469123a60d1a743c7bf48d91191ac493e3af1cd5

Does that mean it will make it's way to the normal 22.1.7_ updates (or 22.1.x) in the near future?

Since the move to FreeBSD in 22.1 is there any preference to base OPNSense releases on -RELEASE over -STABLE or does it not matter?

Sorry if the question is elementary I'm new to OPNSense.
#6
It looks like I am hitting the same issue as: https://forum.opnsense.org/index.php?topic=28302.0

Folks in that thread are trying to apply the 22.7pre3 kernel based on FreeBSD 13.1 kernel via:

# opnsense-update -bkzr 22.7.pre3
# yes | opnsense-shell reboot


Before I do this, last night I removed the mini-PCI Qualcomm Atheros wifi card that I had added after I received my new Protectli FW2B as advised by Protectli support to simplify my hardware setup so I'll see how that goes.

Trawling through the FreeBSD Bugzilla this error could be a lot of things but there is at least a few yet-unsolved and recent bug report for similar crashes like this one:  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256816

I don't know how to analyze the dumps however to understand and debug what might be the cause.

I'll report back here if I have any more crashes, if so I'll proceed with trying the 22.7pre3 / FBSD 13.1 kernel as suggested in the other thread.
#7
I seem to have the same issue, getting  Fatal trap 12: page fault while in kernel mode as well.

No IDS, no heavy services just OpenVPN.  I'd be willing to try the 22.7 snapshot too if it seems to make it go away.

https://forum.opnsense.org/index.php?topic=28422.0

#8
Here's just the crashlog (ddb) and msgbuf.

The only hardware modification I made to the Protectli FW2B is adding a mini PCI wifi card (I have it disabled and only use it on-demand).  I had it in another Haswell-based mini-PC running OPNSense 21.7 without any issues before upgrading to this new FW2B.

ath0@pci0:3:0:0: class=0x028000 rev=0x01 hdr=0x00 vendor=0x168c device=0x002a subvendor=0x1a32 subdevice=0x0306
vendor = 'Qualcomm Atheros'
device = 'AR928X Wireless Network Adapter (PCI-Express)'
class = network


ath0_wlan1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 00:17:c4:a3:c6:81
groups: wlan
ssid "" channel 1 (2412 MHz 11b)
regdomain 101 indoor ecm authmode OPEN privacy OFF txpower 20
scanvalid 60 wme burst dtimperiod 1 -dfs bintval 0
parent interface: ath0
media: IEEE 802.11 Wireless Ethernet autoselect <hostap> (autoselect <hostap>)


I do see interface errors when it's enabled in the counters and the logs do have the following error but researching this on FreeBSD forums and other places it doesn't seem to be fatal and it does work.  I turn it on and use it rarely and the crashes have occurred when the interface is disabled in the UI.

(lots of these in crash logs)
ath0: stuck beacon; resetting (bmiss count 4)
ath0: stuck beacon; resetting (bmiss count 4)
#9
Hello, long-time lurker.

I am running OPNSense 22.1.7_1 on a brand new Protectli FW2B and I'm getting a crash every 12-24 hours or so (panic string: page fault).

I'm only running an OpenVPN client which I've associated with my LAN network and it works great.

System Info
FreeBSD 13.0-STABLE stable/22.1-n248071-cafeb6ce414 SMP amd64
OPNsense 22.1.7_1 3cc3877c1
Plugins os-ddclient-1.5 os-dmidecode-1.1_1 os-nut-1.8.1 os-theme-cicada-1.29 os-theme-rebellion-1.8.8
Time Thu, 19 May 2022 10:23:02 +0000
OpenSSL 1.1.1o  3 May 2022
PHP 7.4.29


I get the following crash dumps:

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010c11e40
vpanic() at vpanic+0x17f/frame 0xfffffe0010c11e90
panic() at panic+0x43/frame 0xfffffe0010c11ef0
mca_intr() at mca_intr+0xbb/frame 0xfffffe0010c11f20
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0010c11f20
--- trap 0x1c, rip = 0xffffffff8120c430, rsp = 0xfffffe000f5abd48, rbp = 0xfffffe000f5abd70 ---
native_lapic_eoi() at native_lapic_eoi/frame 0xfffffe000f5abd70
Xtimerint() at Xtimerint+0xb1/frame 0xfffffe000f5abd70
--- interrupt, rip = 0xffffffff8110d8e5, rsp = 0xfffffe000f5abe40, rbp = 0xfffffe000f5abe50 ---
cpu_idle() at cpu_idle+0xe5/frame 0xfffffe000f5abe50
sched_idletd() at sched_idletd+0x4e1/frame 0xfffffe000f5abef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000f5abf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000f5abf30
--- trap 0xf618070, rip = 0xffffffff80c2b91f, rsp = 0, rbp = 0xffffffff8131d1ea ---
mi_startup() at mi_startup+0xdf/frame 0xffffffff8131d1ea
KDB: enter: panic
---<>---


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 04
fault virtual address = 0xffffffff80cf6250
fault code = supervisor read instruction, page not present
instruction pointer = 0x20:0xffffffff80cf6250
stack pointer         = 0x28:0xfffffe0010c11df8
frame pointer         = 0x28:0xfffffe0010c11e30
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 11 (idle: cpu1)
trap number = 12
panic: page fault
cpuid = 1
time = 1652910869
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010c11bb0
vpanic() at vpanic+0x17f/frame 0xfffffe0010c11c00
panic() at panic+0x43/frame 0xfffffe0010c11c60
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0010c11cc0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0010c11d20
calltrap() at calltrap+0x8/frame 0xfffffe0010c11d20
--- trap 0xc, rip = 0xffffffff80cf6250, rsp = 0xfffffe0010c11df8, rbp = 0xfffffe0010c11e30 ---
printf() at printf/frame 0xfffffe0010c11e30
mca_scan() at mca_scan+0x4f6/frame 0xfffffe0010c11ef0
mca_intr() at mca_intr+0x39/frame 0xfffffe0010c11f20
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0010c11f20
--- trap 0x1c, rip = 0xffffffff80c2eae9, rsp = 0xfffffe000f5abda0, rbp = 0xfffffe000f5abdc0 ---
statclock() at statclock+0x159/frame 0xfffffe000f5abdc0
handleevents() at handleevents+0xf3/frame 0xfffffe000f5abe00
cpu_activeclock() at cpu_activeclock+0x70/frame 0xfffffe000f5abe30
cpu_idle() at cpu_idle+0xa8/frame 0xfffffe000f5abe50
sched_idletd() at sched_idletd+0x4e1/frame 0xfffffe000f5abef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000f5abf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000f5abf30
--- trap 0xf618070, rip = 0xffffffff80c2b91f, rsp = 0, rbp = 0xffffffff8131d1ea ---
mi_startup() at mi_startup+0xdf/frame 0xffffffff8131d1ea
KDB: enter: panic


Attached are the full crash files.

Does this look like a hardware issue?  This unit is a few weeks old.