Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - mr_penguin

#1
Updated and new debug kernel installed. I'll PM you when I have a core dump to share.
#3
Well it's hard to prove that a random crash has stopped but we went from multiple crashes a day to 36 hours and counting of uptime with the state sync disabled. It looks like you are onto something.
#4
Thanks for digging into this. I have disabled state sync on both nodes and will let you know the results.
#5
Sounds good to me. How do I get a debug kernel?
#6
Attached are 2 consecutive crash dumps, only minutes apart. At first glance, the stack traces are identical.
#7
I can grab the full backtrace the next time it happens. I have been submitting bug reports as it happens. This is on the latest 23.1.9, and has been happening since at least the 22.1 series, possibly even longer.
#8
Hi,
I have been using OPNsense for several years now, and at some point in the last year or so I started to get random crashes.

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x0
fault code      = supervisor read data, page not present

The stack trace always ends at pf_test_state_icmp().

I suspected hardware issues, so I bought a completely new system, installed OPNsense, and restored my config. Same issue. Seems to point to a software issue, but I can't figure out where to start looking.

I have a HA pair setup, with the backup instance on VMware. Notably, that one doesn't seem to have the crash problem.
The primary was a Qotom Q355-G4, and has been replaced with https://www.aliexpress.us/item/3256804355685285.html configured with 8GB RAM, Intel N6005. No hardware has been shared between them.

My plugins are:
os-acme-client
os-chrony
os-etpro-telemetry
os-mdns-repeater
os-smart
os-theme-vicuna
os-vnstat
os-wireguard

I have a pair of IPsec tunnels, and a handful of Wireguard clients. I am using CARP on the WAN interface, and all of the internal interfaces. The interfaces are configured as LAGGs, with only 1 interface each (to provide failover compatibility with the VMware instance)

I have Hybrid Outbound NAT configured to set the CARP WAN address as the source for my internal networks

No unusual rules, no policy based routing. I used to have Daul WAN setup, but no longer have Dual WANs. That interface is disabled. I also used to have IPv6 configured, but no longer have IPv6 on my WAN. I have a he.net gif tunnel setup, but is disabled.

The crashes happen randomly, no pattern whatsoever. Sometimes it's 12 hours, sometimes it's 2. I'm at a loss where to look. The pf_test_state_icmp() is the only clue I have so far. I have no rules referencing ICMP, all tunables are default. I cleared them just to be sure.