OPNsense Forum

Archive => 21.7 Legacy Series => Topic started by: wbk on February 27, 2022, 09:44:43 pm

Title: panic on boot, where to look for causes?
Post by: wbk on February 27, 2022, 09:44:43 pm
Hi all,

My OPNsense has faultlessly been running for months on an overpowered platform. Now I replaced my desktop computer with my router as an upgrade, and repurposed my server as router.

My server has been running even longer than my router without problems, but I can't get a stable internet connection, if at all.

Maybe a kernel panic is to fault, I am not quite sure how to read the error log. I would say it is hardware related, can I recognize from this whether RAM is to blame, the SSD or maybe the network hardware itself?

Below is the tail of the log, cut from the startup 'beep' till the end of the log. The error logs give quite detailed information, which should I post?  Thanks in advance!

Code: [Select]
<118>>>> Invoking start script 'beep'
<118>Root file system: zroot/ROOT/default
<118>Sun Feb 27 20:32:50 CET 2022
<118>
<118>*** poort.osba.nl: OPNsense 21.7.8 (amd64/OpenSSL) ***
<118>
<118> LANpoort (em0)  -> v4: 192.168.1.1/24
<118> WANpoort (pppoe0) ->
<118>
<118> HTTPS: SHA256 F8 4F 4B 4B C1 55 38 CD A3 63 23 B4 1B B5 0A 4C
<118>               9B E5 EA FF 17 53 72 DA 86 E2 41 1C 3B 36 7E C8
<118> SSH:   SHA256 pVImfc1BUmRFkgMUk2ckqucwijfBqwq89ccwWKU405g (ECDSA)
<118> SSH:   SHA256 11lYai/e0awhzusFcvJGA+8G3/RjqK03OC/BAm8UtCo (ED25519)
<118> SSH:   SHA256 ZN+FvJYDAIovuPb5PAbImONW8/SwXGU5pisTpPXRXc4 (RSA)
<6>em1: link state changed to DOWN
<6>em1_vlan6: link state changed to DOWN
<6>em1: link state changed to UP
<6>em1_vlan6: link state changed to UP
<6>ng0: changing name to 'pppoe0'
<6>ng0: changing name to 'pppoe0'
574.211831 [ 295] generic_netmap_unregister Emulated adapter for pppoe0 deactivated
574.213389 [1035] generic_netmap_dtor       Emulated netmap adapter for pppoe0 destroyed
<3>nd6_dad_timer: called with non-tentative address fe80:8::225:90ff:fe33:1188(pppoe0)


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x54
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80fa18e6
stack pointer         = 0x28:0xfffffe0025b12970
frame pointer         = 0x28:0xfffffe0025b129c0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_0)
trap number = 12
panic: page fault
cpuid = 0
time = 1645990654
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
version = FreeBSD 12.1-RELEASE-p22-HBSD #0  6fd65fcb739(stable/21.7)-dirty: Wed Jan 26 20:48:21 CET 2022
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0025b12620
vpanic() at vpanic+0x1a2/frame 0xfffffe0025b12670
panic() at panic+0x43/frame 0xfffffe0025b126d0
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe0025b12730
trap_pfault() at trap_pfault+0x49/frame 0xfffffe0025b12790
trap() at trap+0x29f/frame 0xfffffe0025b128a0
calltrap() at calltrap+0x8/frame 0xfffffe0025b128a0
--- trap 0xc, rip = 0xffffffff80fa18e6, rsp = 0xfffffe0025b12970, rbp = 0xfffffe0025b129c0 ---
in6_setscope() at in6_setscope+0xa6/frame 0xfffffe0025b129c0
ip6_forward() at ip6_forward+0x359/frame 0xfffffe0025b12b10
pf_test6() at pf_test6+0x1cb5/frame 0xfffffe0025b12ca0
pf_check6_out() at pf_check6_out+0x3f/frame 0xfffffe0025b12cd0
pfil_run_hooks() at pfil_run_hooks+0x87/frame 0xfffffe0025b12d60
ip6_output() at ip6_output+0x1a06/frame 0xfffffe0025b12ff0
icmp6_reflect() at icmp6_reflect+0x2f0/frame 0xfffffe0025b130a0
icmp6_error() at icmp6_error+0x4aa/frame 0xfffffe0025b130f0
ip6_forward() at ip6_forward+0xc58/frame 0xfffffe0025b13240
ip6_input() at ip6_input+0xdf6/frame 0xfffffe0025b13330
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe0025b13380
ng_iface_rcvdata() at ng_iface_rcvdata+0x14d/frame 0xfffffe0025b133c0
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b13450
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13490
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b13520
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13560
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b135f0
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13630
ng_pppoe_rcvdata_ether() at ng_pppoe_rcvdata_ether+0x195/frame 0xfffffe0025b136c0
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b13750
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13790
ether_demux() at ether_demux+0x207/frame 0xfffffe0025b137c0
ether_nh_input() at ether_nh_input+0x346/frame 0xfffffe0025b13820
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe0025b13870
ether_input() at ether_input+0x4b/frame 0xfffffe0025b138a0
vlan_input() at vlan_input+0x1f8/frame 0xfffffe0025b138f0
ether_demux() at ether_demux+0x122/frame 0xfffffe0025b13920
ether_nh_input() at ether_nh_input+0x346/frame 0xfffffe0025b13980
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe0025b139d0
ether_input() at ether_input+0x4b/frame 0xfffffe0025b13a00
iflib_rxeof() at iflib_rxeof+0xacb/frame 0xfffffe0025b13ae0
_task_fn_rx() at _task_fn_rx+0xc0/frame 0xfffffe0025b13b20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x144/frame 0xfffffe0025b13b80
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x98/frame 0xfffffe0025b13bb0
fork_exit() at fork_exit+0x83/frame 0xfffffe0025b13bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025b13bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt0600001214206751376  7144 ustarrootwheelpage faultversion.txt06000022414206751376  7624 ustarrootwheelFreeBSD 12.1-RELEASE-p22-HBSD #0  6fd65fcb739(stable/21.7)-dirty: Wed Jan 26 20:48:21 CET 2022
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP


The box locks up frequently, it would seem after about an hour. The activity LED on the WAN port stays active then, and while the activity LED on the LAN port keeps flashing, no traffic is happening (web interface nor SSH is available).

There is not always a fault on reboot. This morning there was,

Code: [Select]

em0: link state changed to UP
em1: link state changed to UP
lo0: link state changed to UP
aesni0: No AES or SHA support.
em1: link state changed to DOWN
vlan0: changing name to 'em1_vlan6'
em0: link state changed to DOWN
WARNING: attempt to domain_add(netgraph) after domainfinalize()
ng0: changing name to 'pppoe0'
em1: link state changed to UP
em1_vlan6: link state changed to UP
em0: link state changed to UP
pflog0: permanently promiscuous mode enabled
em1: link state changed to DOWN
em1_vlan6: link state changed to DOWN
em1: link state changed to UP
em1_vlan6: link state changed to UP
ng0: changing name to 'pppoe0'
ng0: changing name to 'pppoe0'
nd6_dad_timer: called with non-tentative address fe80:8::225:90ff:fe33:1188(pppoe0)


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x54
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80fa18e6
stack pointer         = 0x0:0xfffffe0025b12970
frame pointer         = 0x0:0xfffffe0025b129c0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_0)
trap number = 12
panic: page fault
cpuid = 0
time = 1646029975
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
version = FreeBSD 12.1-RELEASE-p22-HBSD #0  6fd65fcb739(stable/21.7)-dirty: Wed Jan 26 20:48:21 CET 2022
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0025b12620
vpanic() at vpanic+0x1a2/frame 0xfffffe0025b12670
panic() at panic+0x43/frame 0xfffffe0025b126d0
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe0025b12730
trap_pfault() at trap_pfault+0x49/frame 0xfffffe0025b12790
trap() at trap+0x29f/frame 0xfffffe0025b128a0
calltrap() at calltrap+0x8/frame 0xfffffe0025b128a0
--- trap 0xc, rip = 0xffffffff80fa18e6, rsp = 0xfffffe0025b12970, rbp = 0xfffffe0025b129c0 ---
in6_setscope() at in6_setscope+0xa6/frame 0xfffffe0025b129c0
ip6_forward() at ip6_forward+0x359/frame 0xfffffe0025b12b10
pf_test6() at pf_test6+0x1cb5/frame 0xfffffe0025b12ca0
pf_check6_out() at pf_check6_out+0x3f/frame 0xfffffe0025b12cd0
pfil_run_hooks() at pfil_run_hooks+0x87/frame 0xfffffe0025b12d60
ip6_output() at ip6_output+0x1a06/frame 0xfffffe0025b12ff0
icmp6_reflect() at icmp6_reflect+0x2f0/frame 0xfffffe0025b130a0
icmp6_error() at icmp6_error+0x4aa/frame 0xfffffe0025b130f0
ip6_forward() at ip6_forward+0xc58/frame 0xfffffe0025b13240
ip6_input() at ip6_input+0xdf6/frame 0xfffffe0025b13330
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe0025b13380
ng_iface_rcvdata() at ng_iface_rcvdata+0x14d/frame 0xfffffe0025b133c0
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b13450
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13490
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b13520
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13560
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b135f0
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13630
ng_pppoe_rcvdata_ether() at ng_pppoe_rcvdata_ether+0x195/frame 0xfffffe0025b136c0
ng_apply_item() at ng_apply_item+0x2bd/frame 0xfffffe0025b13750
ng_snd_item() at ng_snd_item+0x186/frame 0xfffffe0025b13790
ether_demux() at ether_demux+0x207/frame 0xfffffe0025b137c0
ether_nh_input() at ether_nh_input+0x346/frame 0xfffffe0025b13820
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe0025b13870
ether_input() at ether_input+0x4b/frame 0xfffffe0025b138a0
vlan_input() at vlan_input+0x1f8/frame 0xfffffe0025b138f0
ether_demux() at ether_demux+0x122/frame 0xfffffe0025b13920
ether_nh_input() at ether_nh_input+0x346/frame 0xfffffe0025b13980
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe0025b139d0
ether_input() at ether_input+0x4b/frame 0xfffffe0025b13a00
iflib_rxeof() at iflib_rxeof+0xacb/frame 0xfffffe0025b13ae0
_task_fn_rx() at _task_fn_rx+0xc0/frame 0xfffffe0025b13b20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x144/frame 0xfffffe0025b13b80
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x98/frame 0xfffffe0025b13bb0
fork_exit() at fork_exit+0x83/frame 0xfffffe0025b13bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025b13bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
---<>---
Copyright (c) 2013-2019 The HardenedBSD Project.

A  thread (sorry, don't know how to make a link of it; https://forums.freebsd.org/threads/fatal-trap-12-page-fault-while-in-kernel-mode-during-network-operations.80474/ (https://forums.freebsd.org/threads/fatal-trap-12-page-fault-while-in-kernel-mode-during-network-operations.80474/) ) on a BSD forum suggests looking at offloading features of the NIC. These are turned off.

The memory might be at fault, but when the box is running, I have seen RAM  being used to 80% without a problem (2x 2GB, non-ECC).
Title: Re: panic on boot, where to look for causes?
Post by: franco on February 28, 2022, 10:08:11 am
Hi wbk,

Initially I though I messed this up but it looks like a problem with this patch...

https://github.com/opnsense/src/commit/000c42faf375

when in6_setscope() is called very likely ifp == NULL which causes the panic (rightly so). This should fix it in theory if the assumption is correct:

https://github.com/opnsense/src/commit/6d44059d9

This likely happens during PPPoE connectivity acquire when IPv6 is activated and not everything is properly connected in the kernel.

I think this only happens since 22.1? I can provide a test kernel... are you able to turn of IPv6 in WAN somehow to at least make it more stable to install the test kernel?


Cheers,
Franco
Title: Re: panic on boot, where to look for causes?
Post by: franco on May 23, 2022, 09:29:00 am
Looks like this was never confirmed so it did not make the cut for 22.1. We're including it into 22.7 testing...


Cheers,
Franco