Fatal trap 12: page fault while in kernel mode

Started by ThomasE, March 17, 2022, 01:51:58 PM

Previous topic - Next topic
Hallo allerseits,

wir sind aktuell dabei, eine in die Jahre gekommene, ziemlich komplexe Cisco-Umgebung (3 Router + ASA) zu OPNsense zu migrieren. Im ersten Schritt ersetzen wir den Access-Router, der die Verbindung zum Provider herstellt, weswegen die Konfiguration noch *relativ* übersichtlich ist. Konkret haben wir ein externes Interface zu unserem Provider, ein internes Interface mit ein paar VLANs zum Rechenzentrum und ein Interface für die Administration. Im Rechenzentrum werden ca. 80 IPSec-Tunnel terminiert, und es gibt diverse nach außen verfügbare Dienste (Web, Mail, etc.). Der übliche Traffic ist aktuell wenige hundert MB/s. Der überwältigende Teil des Traffics wird geroutet, ein kleines Netzwerk macht direkt NAT nach draußen. Als Hardware setzen wir zwei HP-Server mit Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz ein mit 4x10GBit und 4x1GBit Netzwerkkarten ein, wobei einer produktiv im Einsatz ist und der andere im Cold Standby. Die Server sind identisch konfiguriert, d.h., wir können recht einfach von einem zum anderen Wechseln, indem wir einfach die Kabel umstecken. Das nur mal für einen groben Eindruck. Falls es konkrete Fragen zur Hardware gibt, beantworte ich die natürlich gern.

Wir setzen OPNsense 21.10.3 (Business Edition) ein.

Jetzt zur Problemstellung:

Solange wir unser Konstrukt mit komplett deaktivierter Firewall betreiben, funktioniert alles (außer natürlich NAT) einwandfrei. Wenn wir aber auf dem jeweils im Einsatz befindlichen Server die Firewall aktivieren oder einen Server mit aktivierter Firewall durch Anstecken der entsprechenden Kabel in Betrieb nehmen, kriegen wir eine Kernel Panic. Das Problem tritt also nur in der Kombination von Last und aktivierter Firewall auf.

Eine fehlerhafte Hardware scheint uns unwahrscheinlich, da das Problem auf zwei verschiedenen Systemen gleichermaßen auftritt. Für sachdienliche Hinweise auch zu der Frage, was man vielleicht noch ausprobieren könnte, wären wir sehr dankbar. Für Rückfragen stehe ich jederzeit gerne hier zur Verfügung.

Ich würde jetzt mal ein paar Infos aus dem Berichterstatter posten. Ich habe es ein klein wenig zusammengekürzt und hoffe, nicht versehentlich etwas Relevantes dabei entfernt zu haben...

/var/crash/info.0

Dump header from device: /dev/gpt/swapfs
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 93184
  Blocksize: 512
  Compression: none
  Dumptime: Wed Mar 16 15:13:54 2022
  Hostname: opnsense.localdomain
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 12.1-RELEASE-p22-HBSD #0  6fd65fcb739(stable/21.7)-dirty: Wed Jan 26 20:48:21 CET 2022
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP
  Panic String: page fault
  Dump Parity: 3398240120
  Bounds: 0
  Dump Status: good


/var/crash/textdump.tar.0


ddb.txt06000014000014214370242  7071 ustarrootwheeldb:0:kdb.enter.default>  run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo>  show alllocks
No such command; use "help" to list available commands
db:1:lockinfo>  show lockedvnods
Locked vnodes
db:0:kdb.enter.default>  show pcpu
cpuid        = 2
dynamic pcpu = 0xfffffe007e602c80
curthread    = 0xfffff810809dd000: pid 12 tid 100395 "irq268: bxe0:fp02"
curpcb       = 0xfffffe01067ceb80
fpcurthread  = none
idlethread   = 0xfffff8108016e000: tid 100005 "idle: cpu2"
curpmap      = 0xffffffff82204968
tssp         = 0xffffffff8232fdf0
commontssp   = 0xffffffff8232fdf0
rsp0         = 0xfffffe01067ceb80
gs32p        = 0xffffffff82336a28
ldt          = 0xffffffff82336a68
tss          = 0xffffffff82336a58
tlb gen      = 65317
curvnet      = 0xfffff81080026400
db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100395 td 0xfffff810809dd000
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe01067ce060
vpanic() at vpanic+0x1bf/frame 0xfffffe01067ce0b0
panic() at panic+0x43/frame 0xfffffe01067ce110
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe01067ce170
trap_pfault() at trap_pfault+0x49/frame 0xfffffe01067ce1d0
trap() at trap+0x29f/frame 0xfffffe01067ce2e0
calltrap() at calltrap+0x8/frame 0xfffffe01067ce2e0
--- trap 0xc, rip = 0xffffffff82918c31, rsp = 0xfffffe01067ce3b0, rbp = 0xfffffe01067ce410 ---
pfr_update_stats() at pfr_update_stats+0x1a1/frame 0xfffffe01067ce410
pf_test() at pf_test+0xb29/frame 0xfffffe01067ce5c0
pf_check_in() at pf_check_in+0x1d/frame 0xfffffe01067ce5e0
pfil_run_hooks() at pfil_run_hooks+0x87/frame 0xfffffe01067ce670
ip_input() at ip_input+0x819/frame 0xfffffe01067ce720
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe01067ce770
ether_demux() at ether_demux+0x139/frame 0xfffffe01067ce7a0
ether_nh_input() at ether_nh_input+0x346/frame 0xfffffe01067ce800
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe01067ce850
ether_input() at ether_input+0x4b/frame 0xfffffe01067ce880
if_input() at if_input+0xa/frame 0xfffffe01067ce890
bxe_rxeof() at bxe_rxeof+0xa1a/frame 0xfffffe01067ce9a0
bxe_task_fp() at bxe_task_fp+0xd4/frame 0xfffffe01067ce9d0
bxe_intr_fp() at bxe_intr_fp+0xc5/frame 0xfffffe01067cea10
ithread_loop() at ithread_loop+0x1d4/frame 0xfffffe01067cea70
fork_exit() at fork_exit+0x83/frame 0xfffffe01067ceab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01067ceab0
--- trap 0, rip = 0, rsp = 0, rbp = 0

Tracing command sleep pid 86803 tid 100319 td 0xfffff8010617c5e0
sched_switch() at sched_switch+0x64a/frame 0xfffffe01064fd780
mi_switch() at mi_switch+0xe2/frame 0xfffffe01064fd7b0
sleepq_catch_signals() at sleepq_catch_signals+0x425/frame 0xfffffe01064fd800
sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame 0xfffffe01064fd840
_sleep() at _sleex215/frame 0xfffffe01064fd8b0
kern_clock_nanosleep() at kern_clock_nanosleep+0x1a6/frame 0xfffffe01064fd930
sys_nanosleep() at sys_nanosleep+0x5f/frame 0xfffffe01064fd970
amd64_syscall() at amd64_syscall+0x364/frame 0xfffffe01064fdab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe01064fdab0
--- syscall (240, FreeBSD ELF64, sys_nanosleep), rip = 0x3d5666f890a, rsp = 0x66f2dd2fe888, rbp = 0x66f2dd2fe8b0 ---

msgbuf.txt06000010467514214370242  7645 ustarrootwheel---<>---
Copyright (c) 2013-2019 The HardenedBSD Project.
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.1-RELEASE-p22-HBSD #0  6fd65fcb739(stable/21.7)-dirty: Wed Jan 26 20:48:21 CET 2022
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP amd64
FreeBSD clang version 8.0.1 (tags/RELEASE_801/final 366581) (based on LLVM 8.0.1)
VT(efifb): resolution 1024x768
HardenedBSD: initialize and check features (__HardenedBSD_version 1200059 __FreeBSD_version 1201000).
CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz (2397.28-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
  Features=0xbfebfbff
  Features2=0x7ffefbff
  AMD Features=0x2c100800
  AMD Features2=0x21
  Structured Extended Features=0x37ab
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics
real memory  = 137438953472 (131072 MB)
avail memory = 133717725184 (127523 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table:
FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads
random: unblocking device.
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
ioapic2  irqs 48-71 on motherboard
Launching APs: 1 8 12 19 5 4 18 21 9 10 15 3 13 6 2 11 14 7 17 22 16 20 23
Timecounter "TSC-low" frequency 1198637536 Hz quality 1000
wlan: mac acl policy registered
random: entropy device external interface
kbd1 at kbdmux0
module_register_init: MOD_LOAD (vesa, 0xffffffff812947f0, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
000.000074 [4344] netmap_init               netmap: loaded module
[ath_hal] loaded
nexus0
efirtc0:  on motherboard
efirtc0: registered as a time-of-day clock, resolution 1.000000s
cryptosoft0:  on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
attimer0:  port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
hpet0:  iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 340
Event timer "HPET2" frequency 14318180 Hz quality 340
Event timer "HPET3" frequency 14318180 Hz quality 340
Event timer "HPET4" frequency 14318180 Hz quality 340
Event timer "HPET5" frequency 14318180 Hz quality 340
Event timer "HPET6" frequency 14318180 Hz quality 340
Event timer "HPET7" frequency 14318180 Hz quality 340
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0:  on acpi0
pci0:  on pcib0
pci0:  at device 11.1 (no driver attached)
pci0:  at device 11.2 (no driver attached)
pci0:  at device 16.1 (no driver attached)
pci0:  at device 16.6 (no driver attached)
pci0:  at device 18.1 (no driver attached)
pcib1:  on acpi0
pci1:  on pcib1
pci1:  at device 11.1 (no driver attached)
pci1:  at device 11.2 (no driver attached)
pci1:  at device 16.1 (no driver attached)
pci1:  at device 16.6 (no driver attached)
pci1:  at device 18.1 (no driver attached)
acpi_syscontainer0:  on acpi0
acpi_syscontainer1:  on acpi0
pcib2:  port 0xcf8-0xcff numa-domain 0 on acpi0
pci2:  numa-domain 0 on pcib2
pcib3:  at device 1.0 numa-domain 0 on pci2
pci3:  numa-domain 0 on pcib3
ciss0:  port 0x2000-0x20ff mem 0x97a00000-0x97afffff,0x97b00000-0x97b003ff at device 0.0 numa-domain 0 on pci3
ciss0: PERFORMANT Transport
pcib4:  at device 1.1 numa-domain 0 on pci2
pci4:  numa-domain 0 on pcib4
pcib5:  at device 2.0 numa-domain 0 on pci2
pci5:  numa-domain 0 on pcib5
bxe0:  mem 0x94000000-0x947fffff,0x94800000-0x94ffffff,0x95010000-0x9501ffff at device 0.0 numa-domain 0 on pci5
bxe0: PCI BAR0 [10] memory allocated: 0x94000000-0x947fffff (8388608) -> 0xfffff80094000000
bxe0: PCI BAR2 [18] memory allocated: 0x94800000-0x94ffffff (8388608) -> 0xfffff80094800000
bxe0: PCI BAR4 [20] memory allocated: 0x95010000-0x9501ffff (65536) -> 0xfffff80095010000
bxe0: Found 10Gb Fiber media.
bxe0: IFMEDIA flags : 20
<6>bxe0: Using defaults for TSO: 65518/35/2048
<6>bxe0: Ethernet address: a0:d3:c1:fe:a0:a0
bxe0: MSI-X vectors Requested 5 and Allocated 5
bxe1:  mem 0x93000000-0x937fffff,0x93800000-0x93ffffff,0x95000000-0x9500ffff at device 0.1 numa-domain 0 on pci5
bxe1: PCI BAR0 [10] memory allocated: 0x93000000-0x937fffff (8388608) -> 0xfffff80093000000
bxe1: PCI BAR2 [18] memory allocated: 0x93800000-0x93ffffff (8388608) -> 0xfffff80093800000
bxe1: PCI BAR4 [20] memory allocated: 0x95000000-0x9500ffff (65536) -> 0xfffff80095000000
bxe1: Found 10Gb Fiber media.
bxe1: IFMEDIA flags : 20
<6>bxe1: Using defaults for TSO: 65518/35/2048
<6>bxe1: Ethernet address: a0:d3:c1:fe:a0:a4
bxe1: MSI-X vectors Requested 5 and Allocated 5
pcib6:  at device 2.1 numa-domain 0 on pci2
pci6:  numa-domain 0 on pcib6
pcib7:  at device 2.2 numa-domain 0 on pci2
pci7:  numa-domain 0 on pcib7
pcib8:  at device 2.3 numa-domain 0 on pci2
pci8:  numa-domain 0 on pcib8
pcib9:  at device 3.0 numa-domain 0 on pci2
pci9:  numa-domain 0 on pcib9
pcib10:  at device 3.1 numa-domain 0 on pci2
pci10:  numa-domain 0 on pcib10
pcib11:  at device 3.2 numa-domain 0 on pci2
pci11:  numa-domain 0 on pcib11
bxe2:  mem 0x96800000-0x96ffffff,0x97000000-0x977fffff,0x97810000-0x9781ffff at device 0.0 numa-domain 0 on pci11
bxe2: PCI BAR0 [10] memory allocated: 0x96800000-0x96ffffff (8388608) -> 0xfffff80096800000
bxe2: PCI BAR2 [18] memory allocated: 0x97000000-0x977fffff (8388608) -> 0xfffff80097000000
bxe2: PCI BAR4 [20] memory allocated: 0x97810000-0x9781ffff (65536) -> 0xfffff80097810000
bxe2: Found 10Gb Fiber media.
bxe2: IFMEDIA flags : 20
<6>bxe2: Using defaults for TSO: 65518/35/2048
<6>bxe2: Ethernet address: 64:51:06:f0:20:58
bxe2: MSI-X vectors Requested 5 and Allocated 5
bxe3:  mem 0x95800000-0x95ffffff,0x96000000-0x967fffff,0x97800000-0x9780ffff at device 0.1 numa-domain 0 on pci11
bxe3: PCI BAR0 [10] memory allocated: 0x95800000-0x95ffffff (8388608) -> 0xfffff80095800000
bxe3: PCI BAR2 [18] memory allocated: 0x96000000-0x967fffff (8388608) -> 0xfffff80096000000
bxe3: PCI BAR4 [20] memory allocated: 0x97800000-0x9780ffff (65536) -> 0xfffff80097800000
bxe3: Found 10Gb Fiber media.
bxe3: IFMEDIA flags : 20
<6>bxe3: Using defaults for TSO: 65518/35/2048
<6>bxe3: Ethernet address: 64:51:06:f0:20:5c
bxe3: MSI-X vectors Requested 5 and Allocated 5
pcib12:  at device 3.3 numa-domain 0 on pci2
pci12:  numa-domain 0 on pcib12
pci2:  at device 17.0 (no driver attached)
xhci0:  mem 0x3bffff00000-0x3bffff0ffff at device 20.0 numa-domain 0 on pci2
xhci0: 32 bytes context size, 64-bit DMA
usbus0 numa-domain 0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
ehci0:  mem 0x97c01000-0x97c013ff at device 26.0 numa-domain 0 on pci2
usbus1: EHCI version 1.0
usbus1 numa-domain 0 on ehci0
usbus1: 480Mbps High Speed USB v2.0
pcib13:  at device 28.0 numa-domain 0 on pci2
pci13:  numa-domain 0 on pcib13
pcib14:  at device 28.2 numa-domain 0 on pci2
pci14:  numa-domain 0 on pcib14
vgapci0:  mem 0x91000000-0x91ffffff,0x92a88000-0x92a8bfff,0x92000000-0x927fffff at device 0.1 numa-domain 0 on pci14
vgapci0: Boot video device
uhci0:  port 0x1300-0x131f at device 0.4 numa-domain 0 on pci14
usbus2 numa-domain 0 on uhci0
usbus2: 12Mbps Full Speed USB v1.0
pcib15:  at device 28.4 numa-domain 0 on pci2
pci15:  numa-domain 0 on pcib15
bge0:  mem 0x97990000-0x9799ffff,0x979a0000-0x979affff,0x979b0000-0x979bffff at device 0.0 numa-domain 0 on pci15
bge0: APE FW version: NCSI v1.2.46.0
bge0: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus0:  numa-domain 0 on bge0
brgphy0:  PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
<6>bge0: Using defaults for TSO: 65518/35/2048
<6>bge0: Ethernet address: c4:34:6b:ba:d8:10
bge1:  mem 0x97960000-0x9796ffff,0x97970000-0x9797ffff,0x97980000-0x9798ffff at device 0.1 numa-domain 0 on pci15
bge1: APE FW version: NCSI v1.2.46.0
bge1: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus1:  numa-domain 0 on bge1
brgphy1:  PHY 2 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
<6>bge1: Using defaults for TSO: 65518/35/2048
<6>bge1: Ethernet address: c4:34:6b:ba:d8:11
bge2:  mem 0x97930000-0x9793ffff,0x97940000-0x9794ffff,0x97950000-0x9795ffff at device 0.2 numa-domain 0 on pci15
bge2: APE FW version: NCSI v1.2.46.0
bge2: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus2:  numa-domain 0 on bge2
brgphy2:  PHY 3 on miibus2
brgphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
<6>bge2: Using defaults for TSO: 65518/35/2048
<6>bge2: Ethernet address: c4:34:6b:ba:d8:12
bge3:  mem 0x97900000-0x9790ffff,0x97910000-0x9791ffff,0x97920000-0x9792ffff at device 0.3 numa-domain 0 on pci15
bge3: APE FW version: NCSI v1.2.46.0
bge3: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus3:  numa-domain 0 on bge3
brgphy3:  PHY 4 on miibus3
brgphy3:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
<6>bge3: Using defaults for TSO: 65518/35/2048
<6>bge3: Ethernet address: c4:34:6b:ba:d8:13
ehci1:  mem 0x97c00000-0x97c003ff at device 29.0 numa-domain 0 on pci2
usbus3: EHCI version 1.0
usbus3 numa-domain 0 on ehci1
usbus3: 480Mbps High Speed USB v2.0
isab0:  at device 31.0 numa-domain 0 on pci2
isa0:  numa-domain 0 on isab0
pcib16:  numa-domain 1 on acpi0
pci16:  numa-domain 1 on pcib16
pcib17:  at device 0.0 numa-domain 1 on pci16
pci17:  numa-domain 1 on pcib17
pcib18:  at device 1.0 numa-domain 1 on pci16
pci18:  numa-domain 1 on pcib18
pcib19:  at device 1.1 numa-domain 1 on pci16
pci19:  numa-domain 1 on pcib19
pcib20:  at device 2.0 numa-domain 1 on pci16
pci20:  numa-domain 1 on pcib20
pcib21:  at device 2.1 numa-domain 1 on pci16
pci21:  numa-domain 1 on pcib21
pcib22:  at device 2.2 numa-domain 1 on pci16
pci22:  numa-domain 1 on pcib22
pcib23:  at device 2.3 numa-domain 1 on pci16
pci23:  numa-domain 1 on pcib23
pcib24:  at device 3.0 numa-domain 1 on pci16
pci24:  numa-domain 1 on pcib24
pcib25:  at device 3.1 numa-domain 1 on pci16
pci25:  numa-domain 1 on pcib25
pcib26:  at device 3.2 numa-domain 1 on pci16
pci26:  numa-domain 1 on pcib26
pcib27:  at device 3.3 numa-domain 1 on pci16
pci27:  numa-domain 1 on pcib27
cpu0:  numa-domain 0 on acpi0
uart0:  port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart1:  port 0x2f8-0x2ff irq 3 on acpi0
atkbdc0:  at port 0x60,0x64 on isa0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
est0:  numa-domain 0 on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1b9c00001a00
device_attach: est0 attach returned 6
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1b9c00001a00
device_attach: est23 attach returned 6
Timecounters tick every 1.000 msec
ugen3.1:  at usbus3
ugen1.1:  at usbus1
ugen2.1:  at usbus2
ugen0.1: <0x8086 XHCI root HUB> at usbus0
uhub0:  on usbus3
da0 at ciss0 bus 0 scbus0 target 0 lun 0
da0:  Fixed Direct Access SPC-3 SCSI device
da0: Serial Number PDNLH0BRH7X2JF 
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 76284MB (156231360 512 byte sectors)
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
uhub1:  on usbus1
uhub2:  on usbus2
uhub3: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
<118>Mounting filesystems...
<118>tunefs: soft updates remains unchanged as enabled
<118>tunefs: file system reloaded
uhub2: 2 ports with 2 removable, self powered
<118>camcontrol: ATA ATA_IDENTIFY via pass_16 failed
<118>camcontrol: ATA ATAPI_IDENTIFY via pass_16 failed
<118>** /dev/gpt/rootfs
<118>FILE SYSTEM CLEAN; SKIPPING CHECKS
<118>clean, 16087036 free (5364 frags, 2010209 blocks, 0.0% fragmentation)
<118>Setting hostuuid: 30393137-3136-5a43-4a35-303530343048.
<118>Setting hostid: 0xc7d58f8e.
<118>Configuring vt: blanktime.
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub3: 21 ports with 21 removable, self powered
<118>Setting up memory disks...done.
<118>Configuring crash dump device: /dev/gpt/swapfs
<118>swapon: adding /dev/gpt/swapfs as swap device
<118>.ELF ldconfig path: /lib /usr/lib /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg /usr/local/lib/ipsec /usr/local/lib/perl5/5.32/mach/CORE
<118>32-bit compatibility ldconfig path:
<118>done.
ugen0.2:  at usbus0
uhub4 numa-domain 0 on uhub3
uhub4:  on usbus0
ugen3.2:  at usbus3
uhub5 numa-domain 0 on uhub0
uhub5:  on usbus3
ugen1.2:  at usbus1
uhub6 numa-domain 0 on uhub1
uhub6:  on usbus1
uhub6: 6 ports with 6 removable, self powered
<118>>>> Invoking early script 'upgrade'
uhub4: 2 ports with 1 removable, self powered
<118>>>> Invoking early script 'configd'
uhub5: 8 ports with 8 removable, self powered
<118>Starting configd.
ugen0.3:  at usbus0
ukbd0 numa-domain 0 on uhub3
ukbd0:  on usbus0
kbd2 at ukbd0
<118>>>> Invoking early script 'templates'
<118>Generating configuration:
ugen0.4:  at usbus0
umass0 numa-domain 0 on uhub3
umass0:  on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:2:0: Attached to scbus2
cd0 at umass-sim0 bus 0 scbus2 target 0 lun 0
cd0:  Removable CD-ROM SCSI device
cd0: 40.000MB/s transfers
cd0: 1597MB (817870 2048 byte sectors)
cd0: quirks=0x10<10_BYTE_ONLY>
<118>OK
<118>>>> Invoking early script 'backup'
<118>>>> Invoking backup script 'captiveportal'
<118>>>> Invoking backup script 'dhcpleases'
<118>>>> Invoking backup script 'duid'
<118>>>> Invoking backup script 'netflow'
<118>>>> Invoking backup script 'rrd'
<118>>>> Invoking early script 'carp'
<118>CARP event system: OK
<118>Launching the init system...done.
<118>Initializing...........done.
<118>Starting device manager...
ioat0:  mem 0x3bffff2c000-0x3bffff2ffff at device 4.0 numa-domain 0 on pci2
ioat0: Capabilities: 2f7
ioat1:  mem 0x3bffff28000-0x3bffff2bfff at device 4.1 numa-domain 0 on pci2
ioat1: Capabilities: 2f7
ioat2:  mem 0x3bffff24000-0x3bffff27fff at device 4.2 numa-domain 0 on pci2
ioat2: Capabilities: f7
ioat3:  mem 0x3bffff20000-0x3bffff23fff at device 4.3 numa-domain 0 on pci2
ioat3: Capabilities: f7
ioat4:  mem 0x3bffff1c000-0x3bffff1ffff at device 4.4 numa-domain 0 on pci2
ioat4: Capabilities: f7
ioat5:  mem 0x3bffff18000-0x3bffff1bfff at device 4.5 numa-domain 0 on pci2
ioat5: Capabilities: f7
ioat6:  mem 0x3bffff14000-0x3bffff17fff at device 4.6 numa-domain 0 on pci2
ioat6: Capabilities: f7
ioat7:  mem 0x3bffff10000-0x3bffff13fff at device 4.7 numa-domain 0 on pci2
ioat7: Capabilities: f7
ioat8:  mem 0x3fffff1c000-0x3fffff1ffff at device 4.0 numa-domain 1 on pci16
ioat8: Capabilities: 2f7
ioat9:  mem 0x3fffff18000-0x3fffff1bfff at device 4.1 numa-domain 1 on pci16
ioat9: Capabilities: 2f7
ioat10:  mem 0x3fffff14000-0x3fffff17fff at device 4.2 numa-domain 1 on pci16
ioat10: Capabilities: f7
ioat11:  mem 0x3fffff10000-0x3fffff13fff at device 4.3 numa-domain 1 on pci16
ioat11: Capabilities: f7
ioat12:  mem 0x3fffff0c000-0x3fffff0ffff at device 4.4 numa-domain 1 on pci16
ioat12: Capabilities: f7
ioat13:  mem 0x3fffff08000-0x3fffff0bfff at device 4.5 numa-domain 1 on pci16
ioat13: Capabilities: f7
ioat14:  mem 0x3fffff04000-0x3fffff07fff at device 4.6 numa-domain 1 on pci16
ioat14: Capabilities: f7
ioat15:  mem 0x3fffff00000-0x3fffff03fff at device 4.7 numa-domain 1 on pci16
ioat15: Capabilities: f7
ums0 numa-domain 0 on uhub3
ums0:  on usbus0
ums0: 5 buttons and [XYZ] coordinates ID=1
<118>done.
<118>Configuring login behaviour...done.
<118>Configuring loopback interface...
<6>lo0: link state changed to UP
<118>done.
<118>Configuring kernel modules...
aesni0:  on motherboard
coretemp0:  numa-domain 0 on cpu0
est0:  numa-domain 0 on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 16c100000c00
device_attach: est0 attach returned 6
<118>done.
<118>Setting up extended sysctls...done.
<118>Setting timezone...done.
<118>Writing firmware setting...done.
<118>Writing trust files...done.
<118>Setting hostname: opnsense.localdomain
<118>Generating /etc/hosts...done.
<118>Configuring system logging...done.
<118>Configuring loopback interface...done.
<118>Creating wireless clone interfaces...done.
<118>Configuring VLAN interfaces...
<6>vlan0: changing name to 'bxe1_vlan4017'
<118>done.
<118>Creating OpenVPN instances...done.
<118>Configuring L2 interface...
<6>bxe2: link state changed to DOWN
bxe2: ERROR: Changing VLAN_HWTAGGING is not supported!
bxe2: ERROR: Changing VLAN_HWFILTER is not supported!
bxe2: ERROR: Changing VLAN_HWCSUM is not supported!
<118>done.
<118>Configuring LAN interface...
<6>bge0: link state changed to DOWN
<118>done.
<118>Configuring Management interface...
<6>bge3: link state changed to DOWN
<118>done.
<118>Configuring SRZ interface...
<6>bxe1: link state changed to DOWN
<6>bxe1_vlan4017: link state changed to DOWN
bxe1: ERROR: Changing VLAN_HWTAGGING is not supported!
bxe1: ERROR: Changing VLAN_HWFILTER is not supported!
bxe1: ERROR: Changing VLAN_HWCSUM is not supported!
<118>done.
<118>Configuring Provider interface...
bxe0: ERROR: Changing VLAN_HWTAGGING is not supported!
bxe0: ERROR: Changing VLAN_HWFILTER is not supported!
bxe0: ERROR: Changing VLAN_HWCSUM is not supported!
<118>done.
<118>Creating IPsec VTI instances...done.
<118>Generating /etc/resolv.conf...done.
<118>Configuring firewall....done.
<118>Configuring OpenSSH...done.
<118>Starting web GUI...done.
<118>Configuring CRON...done.
<118>Setting up routes...done.
<118>Generating /etc/hosts...done.
<118>Setting up gateway monitors...done.
<118>Configuring firewall....done.
<6>bge0: link state changed to UP
<6>bge3: link state changed to UP
<118>Syncing OpenVPN settings...done.
<118>Starting NTP service...done.
<118>Generating RRD graphs...done.
<118>Configuring system logging...done.
<118>>>> Invoking start script 'newwanip'
<118>Reconfiguring routes: OK
<118>>>> Invoking start script 'freebsd'
<118>>>> Invoking start script 'syslog-ng'
<118>Stopping syslog_ng.
<118>Waiting for PIDS: 26650.
<118>Starting syslog_ng.
<118>>>> Invoking start script 'carp'
<118>>>> Invoking start script 'cron'
<118>Starting Cron: OK
<118>>>> Invoking start script 'beep'
<118>Root file system: /dev/gpt/rootfs
<118>Wed Mar 16 13:23:45 CET 2022
<118>
<118>*** opnsense.localdomain: OPNsense 21.10.3 (amd64/OpenSSL) ***
<118>
<118> L2 (bxe2)   ->
<118> LAN (bge0)      -> v4: 172.25.0.248/24
<118> Management (bge3) -> v4: 192.168.0.254/24
<118> SRZ (bxe1)      -> v4: 84.19.212.142/28
<118> Provider (bxe0) -> v4: 84.19.212.2/29
<118> WLAN (bxe1_vlan4017) -> v4: 192.168.17.7/24
<118>
<118> HTTPS: SHA256 02 EC 7B 00 E9 35 0A C4 34 04 85 F5 25 E3 B7 EF
<118>               64 FB DB 2F D3 95 26 D7 77 C0 81 16 1E C9 D1 AC
<118> SSH:   SHA256 LjvfNArWl4pCSZyRHfjGVQysR+sdG+xSy1W99opUFAo (ECDSA)
<118> SSH:   SHA256 I1A3q6SK1OuDPuuZ+VamIdFSNHJuaJ75Q5Zjcl7Ndck (ED25519)
<118> SSH:   SHA256 7ZPrqA1ZH7zV+3O3TvIzBaIt5reGcqbYRqaYqLC/Yhw (RSA)
<6>pflog0: permanently promiscuous mode enabled
<6>bge3: link state changed to DOWN
bxe0: ELINK EVENT LOG (4)
bxe0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive & transmit
<6>bxe0: link state changed to UP


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address = 0x2000
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff82918c31
stack pointer         = 0x28:0xfffffe01067ce3b0
frame pointer         = 0x28:0xfffffe01067ce410
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (irq268: bxe0:fp02)
trap number = 12
panic: page fault
cpuid = 2
time = 1647440034
__HardenedBSD_version = 1200059 __FreeBSD_version = 1201000
version = FreeBSD 12.1-RELEASE-p22-HBSD #0  6fd65fcb739(stable/21.7)-dirty: Wed Jan 26 20:48:21 CET 2022
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01067ce060
vpanic() at vpanic+0x1a2/frame 0xfffffe01067ce0b0
panic() at panic+0x43/frame 0xfffffe01067ce110
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe01067ce170
trap_pfault() at trap_pfault+0x49/frame 0xfffffe01067ce1d0
trap() at trap+0x29f/frame 0xfffffe01067ce2e0
calltrap() at calltrap+0x8/frame 0xfffffe01067ce2e0
--- trap 0xc, rip = 0xffffffff82918c31, rsp = 0xfffffe01067ce3b0, rbp = 0xfffffe01067ce410 ---
pfr_update_stats() at pfr_update_stats+0x1a1/frame 0xfffffe01067ce410
pf_test() at pf_test+0xb29/frame 0xfffffe01067ce5c0
pf_check_in() at pf_check_in+0x1d/frame 0xfffffe01067ce5e0
pfil_run_hooks() at pfil_run_hooks+0x87/frame 0xfffffe01067ce670
ip_input() at ip_input+0x819/frame 0xfffffe01067ce720
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe01067ce770
ether_demux() at ether_demux+0x139/frame 0xfffffe01067ce7a0
ether_nh_input() at ether_nh_input+0x346/frame 0xfffffe01067ce800
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe01067ce850
ether_input() at ether_input+0x4b/frame 0xfffffe01067ce880
if_input() at if_input+0xa/frame 0xfffffe01067ce890
bxe_rxeof() at bxe_rxeof+0xa1a/frame 0xfffffe01067ce9a0
bxe_task_fp() at bxe_task_fp+0xd4/frame 0xfffffe01067ce9d0
bxe_intr_fp() at bxe_intr_fp+0xc5/frame 0xfffffe01067cea10
ithread_loop() at ithread_loop+0x1d4/frame 0xfffffe01067cea70
fork_exit() at fork_exit+0x83/frame 0xfffffe01067ceab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01067ceab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt0600001214214370242  7130 ustarrootwheelpage faultversion.txt06000022414214370242  7610 ustarrootwheelFreeBSD 12.1-RELEASE-p22-HBSD #0  6fd65fcb739(stable/21.7)-dirty: Wed Jan 26 20:48:21 CET 2022
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP



Hallo,

Die 21.10.3 ist noch HBSD 12.1 und ab 22.4 (nächsten Monat) ist dann auch die Business Edition auf FreeBSD 13.

Da hat sich einiges geändert rund um die Funktion pfr_update_stats() sodass ich davon ausgehe, dass dies bereits gelöst ist.


Grüsse
Franco

Danke für die schnelle Antwort. Wir hatten ganz am Anfang die Community Edition (22.1) im Einsatz, und auch damit gab es Probleme, was nebenbei bemerkt einer der Gründe war, umgehend auf die Business Edition zu wechseln. (Wir hatten das ursprünglich erst für später geplant...) Wir haben jetzt keine Dumps mehr aus dieser Zeit, aber ich würde mich zu der Behauptung hinreißen lassen, dass es das selbe Problem ist - zumindest die Symptomatik war gleich.

Grundsätzlich wären wir natürlich offen, mit unserem zwischenzeitlich gesammelten Erfahrungen nochmal die CE zu probieren, aber ehrlich gesagt mache ich mir da nur wenig Hoffnungen.

Kann man eine CE-Installation im Nachgang durch ein entsprechendes Update in die BE umwandeln, sobald diese verfügbar ist?

Danke und LG
Thomas

Hallo Thomas,

Ok, danke für die Informationen.

Das Sidegrade von CE und BE scheitert noch an der 12.1/13 Geschichte. Sobald 22.4 verfügbar ist läuft das wieder sauber zwischen 2 gleichen OS Versionen (22.1 und 22.4).

Kann es vielleicht daran liegen, dass Aliase mit Statistikfunktion verwendet werden? Da gab es noch FreeBSD Bugs soweit wir wissen.


Grüsse
Franco

Hallo Franco,
Quote from: franco on March 17, 2022, 04:02:44 PM
Kann es vielleicht daran liegen, dass Aliase mit Statistikfunktion verwendet werden? Da gab es noch FreeBSD Bugs soweit wir wissen.
ja, es gibt tatsächlich genau ein Alias, bei dem wir diese Funktion aktiviert hatten. Nur nochmal kurz zur Verifikation: Man erkennt das an der Zeile


<counters>1</counters>


in der Konfigurationsdatei.

Wenn das jetzt zum Erfolg führt, wäre es natürlich richtig genial... Es klingt auf jeden Fall schon mal vielversprechend!

Danke und LG
Thomas

Hallo Thomas,

Cool, ja bitte ausprobieren. Da kommt auf jeden Fall ein Patch in 22.7 mit FreeBSD 13.1.

Es kann zum Beispiel sein, dass dies passiert beim Speichern von Alias/Firewall Regeln während der Traffic gerade die Counter updaten will. Da gab es wohl schon immer Synchronisationsprobleme:

https://reviews.freebsd.org/D34131


Grüsse
Franco

Hallo Franco,

vielen lieben Dank für den Hinweis: Das hat tatsächlich geholfen! Das heißt im Umkehrschluss, dass es möglich ist, durch das Setzen eines Hakens an einer zumindest gefühlt "unverfänglichen" Stelle sein komplettes System ohne Vorwarnung zu crashen. Und da hier auch ein gewisses Maß an Last erforderlich ist, ist es zudem äußerst schwierig, dass in einer Teststellung vorher festzustellen... Krass! Aber ok, wir haben unseren Teil erstmal gelernt. :)

LG
Thomas

Hallo Thomas,

Okay, danke für die Rückmeldung. Wir werden da noch Tests machen mit dem 22.7 Update, damit wir zumindest vorab wissen ob das Problem dann gelöst ist oder immer noch besteht.


Grüsse
Franco