New install giving MCA kernel panic and other issues

Started by Dweeb9248, June 29, 2023, 05:59:26 PM

Previous topic - Next topic
Hello all. I am new to OPNSense (although I did use pfSense in the past). I am trying to get a machine working as a firewall. Sadly, this is also my home "production" firewall due to my pfSense old machine dying after losing electricity. Now, by "production" I does not mean it is hyper critical or anything (we do work from home, but we have backup plan... so unless you count kids watching YouTube as hyper critical? 😅), as such I am not that concerned about having fully qualified and enterprise grade material.

So I've got one main issue and two side issues that I can't figure out the logic or why (let's call then pet peeves😅!).

Main issue - MCA Kernel Panic

So for my main issue, I've got the kernel throwing some MCA "notice" logs ending with a "notice" KDB: enter: panic. This is followed by a typical reboot that I cannot see any issues in it. From my search (most results came from other FreeBSD-based software such as FreeNAS/TrueNAS/pfSense), it is a hardware issue. The rare threads that had a conclusion or definitive solution were related to either USB keyboard or Power Supply issues (loose cables, powerd, etc.). Since I have no USB keyboard plugged (and no other USB hardware plugged in) and I did check the power supply cord I have no idea where to look next. However, I suspect in my case this would be related to the NIC in some ways.

Why I think it is related to the NIC in some ways? Well, on a brand new, default configuration with the bare minimum on this firewall (modem in, one port of the NIC out to a switch with only 3 computers on it) it went rock solid for a day or so. Since I wanted to bring up the rest of my home lab (a TrueNAS machine and a Proxmox machine), I made some configuration changes such as using a different port to a different switch for the homelab network (having home on 172.16.1.1/12, homelab on 10.0.0.1/8), installing and configuring DynDNS, installing ACME and HAProxy plugins (without configuring at that time) and changing TrueNAS and Proxmox configuration to works with the new network domain (took time to switch my network domain to a something.home.arpa and use a documented naming scheme for my machines). After making all those changes, the MCA Kernel panic was triggered every hour (it was always around the twentieth minute of each hour, ± 2 minutes). When I saw this the next morning (well, after getting thrown out of a meeting due to connection lost 😅), I deactivated all config changes and shutdown my other machines and the MCA panic stopped. So, this might be material issue, but somewhere there is a configuration that triggers the material to panic!

Since then, I only had one occurrence. Here the relevant parts of the system logs (or I think is the relevant part) of that one. I have absolutely no idea how to decipher this!


2023-06-28T18:50:10-04:00   Notice   kernel    ---<<BOOT>>---
2023-06-28T18:50:10-04:00   Notice   kernel    KDB: enter: panic
2023-06-28T18:50:10-04:00   Notice   kernel    mi_startup() at mi_startup+0xdf/frame 0x6
2023-06-28T18:50:10-04:00   Notice   kernel    --- trap 0xc7f3c018, rip = 0xffffffff80c311df, rsp = 0, rbp = 0x6 ---
2023-06-28T18:50:10-04:00   Notice   kernel    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000e7d6f30
2023-06-28T18:50:10-04:00   Notice   kernel    fork_exit() at fork_exit+0x7e/frame 0xfffffe000e7d6f30
2023-06-28T18:50:10-04:00   Notice   kernel    gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe000e7d6ef0
2023-06-28T18:50:10-04:00   Notice   kernel    gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe000e7d6ec0
2023-06-28T18:50:10-04:00   Notice   kernel    _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe000e7d6e40
2023-06-28T18:50:10-04:00   Notice   kernel    iflib_rxeof() at iflib_rxeof+0xc27/frame 0xfffffe000e7d6e00
2023-06-28T18:50:10-04:00   Notice   kernel    ether_input() at ether_input+0x69/frame 0xfffffe000e7d6d00
2023-06-28T18:50:10-04:00   Notice   kernel    netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe000e7d6ca0
2023-06-28T18:50:10-04:00   Notice   kernel    ether_nh_input() at ether_nh_input+0x1f1/frame 0xfffffe000e7d6c50
2023-06-28T18:50:10-04:00   Notice   kernel    ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe000e7d6bf0
2023-06-28T18:50:10-04:00   Notice   kernel    ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe000e7d6bc0
2023-06-28T18:50:10-04:00   Notice   kernel    ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe000e7d6b80
2023-06-28T18:50:10-04:00   Notice   kernel    ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe000e7d6ae0
2023-06-28T18:50:10-04:00   Notice   kernel    ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe000e7d6aa0
2023-06-28T18:50:10-04:00   Notice   kernel    ng_ether_rcv_upper() at ng_ether_rcv_upper+0x8c/frame 0xfffffe000e7d6a00
2023-06-28T18:50:10-04:00   Notice   kernel    ether_demux() at ether_demux+0x138/frame 0xfffffe000e7d69e0
2023-06-28T18:50:10-04:00   Notice   kernel    netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe000e7d69b0
2023-06-28T18:50:10-04:00   Notice   kernel    ip_input() at ip_input+0x6e4/frame 0xfffffe000e7d6960
2023-06-28T18:50:10-04:00   Notice   kernel    ip_tryforward() at ip_tryforward+0x4f7/frame 0xfffffe000e7d68d0
2023-06-28T18:50:10-04:00   Notice   kernel    ether_output() at ether_output+0x65b/frame 0xfffffe000e7d6810
2023-06-28T18:50:10-04:00   Notice   kernel    ng_ether_output() at ng_ether_output+0x5e/frame 0xfffffe000e7d6780
2023-06-28T18:50:10-04:00   Notice   kernel    ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe000e7d6750
2023-06-28T18:50:10-04:00   Notice   kernel    ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe000e7d6710
2023-06-28T18:50:10-04:00   Notice   kernel    ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe000e7d6670
2023-06-28T18:50:10-04:00   Notice   kernel    ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe000e7d6630
2023-06-28T18:50:10-04:00   Notice   kernel    ether_output_frame() at ether_output_frame+0xab/frame 0xfffffe000e7d6590
2023-06-28T18:50:10-04:00   Notice   kernel    iflib_if_transmit() at iflib_if_transmit+0x227/frame 0xfffffe000e7d6560
2023-06-28T18:50:10-04:00   Notice   kernel    ifmp_ring_enqueue() at ifmp_ring_enqueue+0x299/frame 0xfffffe000e7d64f0
2023-06-28T18:50:10-04:00   Notice   kernel    drain_ring_lockless() at drain_ring_lockless+0x5d/frame 0xfffffe000e7d64b0
2023-06-28T18:50:10-04:00   Notice   kernel    iflib_txq_drain() at iflib_txq_drain+0x38e/frame 0xfffffe000e7d6460
2023-06-28T18:50:10-04:00   Notice   kernel    iflib_encap() at iflib_encap+0x182/frame 0xfffffe000e7d63e0
2023-06-28T18:50:10-04:00   Notice   kernel    --- trap 0x1c, rip = 0xffffffff80dde6d2, rsp = 0xfffffe000e7d6340, rbp = 0xfffffe000e7d63e0 ---
2023-06-28T18:50:10-04:00   Notice   kernel    mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe000fe17f20
2023-06-28T18:50:10-04:00   Notice   kernel    mca_intr() at mca_intr+0xbb/frame 0xfffffe000fe17f20
2023-06-28T18:50:10-04:00   Notice   kernel    panic() at panic+0x43/frame 0xfffffe000fe17ef0
2023-06-28T18:50:10-04:00   Notice   kernel    vpanic() at vpanic+0x17f/frame 0xfffffe000fe17e90
2023-06-28T18:50:10-04:00   Notice   kernel    db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000fe17e40
2023-06-28T18:50:10-04:00   Notice   kernel    KDB: stack backtrace:
2023-06-28T18:50:10-04:00   Notice   kernel    time = 1687992465
2023-06-28T18:50:10-04:00   Notice   kernel    cpuid = 2
2023-06-28T18:50:10-04:00   Notice   kernel    EN panic: Unrecoverable machine check exception
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Misc 0x3ffff
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Address 0x3fff80dde6d2MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: CPU 2 UNCOR EN PCC OVER internal timer error
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: CPU 1 UNCOR MCA: Vendor "GenuineIntel", ID 0x206a7, APIC ID 4
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Vendor "GenuineIntel", ID 0x206a7, APIC ID 2
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Bank 3, Status 0xfe00000000800400
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Bank 3, Status 0xfe00000000800400
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Bank 3, Status 0xfe00000000800400
2023-06-28T18:50:10-04:00   Notice   kernel    MCA: Bank 3, Status 0xfe00000000800400
2023-06-28T18:50:10-04:00   Notice   syslog-ng    syslog-ng starting up; version='4.2.0'


This is then followed by the typical boot sequence, starting with this.


2023-06-28T18:50:10-04:00   Notice   kernel    avail memory = 8150736896 (7773 MB)
2023-06-28T18:50:10-04:00   Notice   kernel    real memory  = 8589934592 (8192 MB)
2023-06-28T18:50:10-04:00   Notice   kernel      TSC: P-state invariant, performance statistics
2023-06-28T18:50:10-04:00   Notice   kernel      VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
2023-06-28T18:50:10-04:00   Notice   kernel      XSAVE Features=0x1<XSAVEOPT>
2023-06-28T18:50:10-04:00   Notice   kernel      AMD Features2=0x1<LAHF>
2023-06-28T18:50:10-04:00   Notice   kernel      AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
2023-06-28T18:50:10-04:00   Notice   kernel      Features2=0x1fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
2023-06-28T18:50:10-04:00   Notice   kernel      Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
2023-06-28T18:50:10-04:00   Notice   kernel      Origin="GenuineIntel"  Id=0x206a7  Family=0x6  Model=0x2a  Stepping=7
2023-06-28T18:50:10-04:00   Notice   kernel    CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (3093.09-MHz K8-class CPU)
2023-06-28T18:50:10-04:00   Notice   kernel    VT(vga): resolution 640x480
2023-06-28T18:50:10-04:00   Notice   kernel    FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
2023-06-28T18:50:10-04:00   Notice   kernel    FreeBSD 13.1-RELEASE-p7 stable/23.1-n250445-fb81510bd0e SMP amd64
2023-06-28T18:50:10-04:00   Notice   kernel    FreeBSD is a registered trademark of The FreeBSD Foundation.
2023-06-28T18:50:10-04:00   Notice   kernel       The Regents of the University of California. All rights reserved.
2023-06-28T18:50:10-04:00   Notice   kernel    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
2023-06-28T18:50:10-04:00   Notice   kernel    Copyright (c) 1992-2021 The FreeBSD Project.


A bit further, I've got the NIC boot sequence.


2023-06-28T18:50:10-04:00   Notice   kernel    <6>em0: netmap queues/slots: TX 1/1024, RX 1/1024
2023-06-28T18:50:10-04:00   Notice   kernel    <6>em0: Ethernet address: 18:03:73:50:d1:f6
2023-06-28T18:50:10-04:00   Notice   kernel    em0: Using an MSI interrupt
2023-06-28T18:50:10-04:00   Notice   kernel    em0: Using 1024 TX descriptors and 1024 RX descriptors
2023-06-28T18:50:10-04:00   Notice   kernel    em0: EEPROM V0.13-4
2023-06-28T18:50:10-04:00   Notice   kernel    em0: <Intel(R) 82579LM> port 0x5080-0x509f mem 0xe2500000-0xe251ffff,0xe2580000-0xe2580fff irq 20 at device 25.0 on pci0
2023-06-28T18:50:10-04:00   Notice   kernel    uart2: Using 1 MSI message
2023-06-28T18:50:10-04:00   Notice   kernel    uart2: <Intel AMT - KT Controller> port 0x50e0-0x50e7 mem 0xe2590000-0xe2590fff irq 17 at device 22.3 on pci0
2023-06-28T18:50:10-04:00   Notice   kernel    pci0: <simple comms> at device 22.0 (no driver attached)
2023-06-28T18:50:10-04:00   Notice   kernel    vgapci0: Boot video device
2023-06-28T18:50:10-04:00   Notice   kernel    vgapci0: <VGA-compatible display> port 0x5000-0x503f mem 0xe0c00000-0xe0ffffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb3: netmap queues/slots: TX 4/1024, RX 4/1024
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb3: Ethernet address: 00:1b:21:41:dc:f5
2023-06-28T18:50:10-04:00   Notice   kernel    igb3: Using MSI-X interrupts with 5 vectors
2023-06-28T18:50:10-04:00   Notice   kernel    igb3: Using 4 RX queues 4 TX queues
2023-06-28T18:50:10-04:00   Notice   kernel    igb3: Using 1024 TX descriptors and 1024 RX descriptors
2023-06-28T18:50:10-04:00   Notice   kernel    igb3: EEPROM V1.77-0 eTrack 0x880e0000
2023-06-28T18:50:10-04:00   Notice   kernel    igb3: <Intel(R) PRO/1000 VT 82575GB (Quad Copper)> port 0x2000-0x201f mem 0xe1400000-0xe141ffff,0xe1000000-0xe11fffff,0xe1440000-0xe1443fff irq 17 at device 0.1 on pci4
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb2: netmap queues/slots: TX 4/1024, RX 4/1024
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb2: Ethernet address: 00:1b:21:41:dc:f4
2023-06-28T18:50:10-04:00   Notice   kernel    igb2: Using MSI-X interrupts with 5 vectors
2023-06-28T18:50:10-04:00   Notice   kernel    igb2: Using 4 RX queues 4 TX queues
2023-06-28T18:50:10-04:00   Notice   kernel    igb2: Using 1024 TX descriptors and 1024 RX descriptors
2023-06-28T18:50:10-04:00   Notice   kernel    igb2: EEPROM V1.77-0 eTrack 0x880e0000
2023-06-28T18:50:10-04:00   Notice   kernel    igb2: <Intel(R) PRO/1000 VT 82575GB (Quad Copper)> port 0x2020-0x203f mem 0xe1420000-0xe143ffff,0xe1200000-0xe13fffff,0xe1450000-0xe1453fff irq 16 at device 0.0 on pci4
2023-06-28T18:50:10-04:00   Notice   kernel    pci4: <PCI bus> on pcib4
2023-06-28T18:50:10-04:00   Notice   kernel    pcib4: <PCI-PCI bridge> at device 4.0 on pci2
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb1: netmap queues/slots: TX 4/1024, RX 4/1024
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb1: Ethernet address: 00:1b:21:41:dc:f1
2023-06-28T18:50:10-04:00   Notice   kernel    igb1: Using MSI-X interrupts with 5 vectors
2023-06-28T18:50:10-04:00   Notice   kernel    igb1: Using 4 RX queues 4 TX queues
2023-06-28T18:50:10-04:00   Notice   kernel    igb1: Using 1024 TX descriptors and 1024 RX descriptors
2023-06-28T18:50:10-04:00   Notice   kernel    igb1: EEPROM V1.77-0 eTrack 0x880c0000
2023-06-28T18:50:10-04:00   Notice   kernel    igb1: <Intel(R) PRO/1000 VT 82575GB (Quad Copper)> port 0x3000-0x301f mem 0xe1a00000-0xe1a1ffff,0xe1600000-0xe17fffff,0xe1a40000-0xe1a43fff irq 19 at device 0.1 on pci3
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb0: netmap queues/slots: TX 4/1024, RX 4/1024
2023-06-28T18:50:10-04:00   Notice   kernel    <6>igb0: Ethernet address: 00:1b:21:41:dc:f0
2023-06-28T18:50:10-04:00   Notice   kernel    igb0: Using MSI-X interrupts with 5 vectors
2023-06-28T18:50:10-04:00   Notice   kernel    igb0: Using 4 RX queues 4 TX queues
2023-06-28T18:50:10-04:00   Notice   kernel    igb0: Using 1024 TX descriptors and 1024 RX descriptors
2023-06-28T18:50:10-04:00   Notice   kernel    igb0: EEPROM V1.77-0 eTrack 0x880c0000
2023-06-28T18:50:10-04:00   Notice   kernel    igb0: <Intel(R) PRO/1000 VT 82575GB (Quad Copper)> port 0x3020-0x303f mem 0xe1a20000-0xe1a3ffff,0xe1800000-0xe19fffff,0xe1a50000-0xe1a53fff irq 18 at device 0.0 on pci3
2023-06-28T18:50:10-04:00   Notice   kernel    pci3: <PCI bus> on pcib3
2023-06-28T18:50:10-04:00   Notice   kernel    pcib3: <PCI-PCI bridge> at device 2.0 on pci2
2023-06-28T18:50:10-04:00   Notice   kernel    pci2: <PCI bus> on pcib2
2023-06-28T18:50:10-04:00   Notice   kernel    pcib2: <PCI-PCI bridge> at device 0.0 on pci1
2023-06-28T18:50:10-04:00   Notice   kernel    pci1: <ACPI PCI bus> on pcib1
2023-06-28T18:50:10-04:00   Notice   kernel    pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
2023-06-28T18:50:10-04:00   Notice   kernel    pci0: <ACPI PCI bus> on pcib0
2023-06-28T18:50:10-04:00   Notice   kernel    pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0


Minor Issue #1 - Log Severity

When searching in the logs, I found out that most logs (like 99.9% of them) are shown as Notice; even the panic logs. It got me puzzled to see that my error above was in "Notice" severity and not in "Error" or above (for a panic, I would at least expect "Critical" or "Emergency", no? I suspect that the way the kernel throw those logs is not compatible with how OPNSense classify them or something like that 🤷‍♂️.

Minor Issue #2 - Gateways status is offline

I don't know if this is related or not to the MCA. But my gateways (System -> Gateways -> Single) is shown as being offline, even though I am well online (I do write this post online, no 🤔?). I am not sure if this is related to the MCA above or not. One thing I did notice, my own IP and the Gateway IP seems pretty close to each other (from what I remember prior to the crash of pfSense). In the log below, XXX.XXX.XXX.161 is my gateway IP and XXX.XXX.XXX.172 is my own IP.


2023-06-28T18:50:22-04:00   Warning   dpinger    send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr XXX.XXX.XXX.161 bind_addr XXX.XXX.XXX.172 identifier "WAN_DHCP "
2023-06-28T18:50:22-04:00   Warning   dpinger    exiting on signal 15
2023-06-28T18:50:18-04:00   Warning   dpinger    send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr XXX.XXX.XXX.161 bind_addr XXX.XXX.XXX.172 identifier "WAN_DHCP "
2023-06-28T03:01:01-04:00   Warning   dpinger    send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr XXX.XXX.XXX.161 bind_addr XXX.XXX.XXX.172 identifier "WAN_DHCP "
2023-06-28T03:01:01-04:00   Warning   dpinger    exiting on signal 15
2023-06-28T03:01:01-04:00   Warning   dpinger    WAN_DHCP 192.168.100.1: sendto error: 65


Anyhow, thanks for reading. 😀