Home
Help
Search
Login
Register
OPNsense Forum
»
Archive
»
23.1 Legacy Series
»
Random crashing with pf_test_state_icmp()
« previous
next »
Print
Pages: [
1
]
2
Author
Topic: Random crashing with pf_test_state_icmp() (Read 2502 times)
mr_penguin
Newbie
Posts: 8
Karma: 3
Random crashing with pf_test_state_icmp()
«
on:
June 05, 2023, 04:23:01 pm »
Hi,
I have been using OPNsense for several years now, and at some point in the last year or so I started to get random crashes.
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x0
fault code = supervisor read data, page not present
The stack trace always ends at pf_test_state_icmp().
I suspected hardware issues, so I bought a completely new system, installed OPNsense, and restored my config. Same issue. Seems to point to a software issue, but I can't figure out where to start looking.
I have a HA pair setup, with the backup instance on VMware. Notably, that one doesn't seem to have the crash problem.
The primary was a Qotom Q355-G4, and has been replaced with
https://www.aliexpress.us/item/3256804355685285.html
configured with 8GB RAM, Intel N6005. No hardware has been shared between them.
My plugins are:
os-acme-client
os-chrony
os-etpro-telemetry
os-mdns-repeater
os-smart
os-theme-vicuna
os-vnstat
os-wireguard
I have a pair of IPsec tunnels, and a handful of Wireguard clients. I am using CARP on the WAN interface, and all of the internal interfaces. The interfaces are configured as LAGGs, with only 1 interface each (to provide failover compatibility with the VMware instance)
I have Hybrid Outbound NAT configured to set the CARP WAN address as the source for my internal networks
No unusual rules, no policy based routing. I used to have Daul WAN setup, but no longer have Dual WANs. That interface is disabled. I also used to have IPv6 configured, but no longer have IPv6 on my WAN. I have a he.net gif tunnel setup, but is disabled.
The crashes happen randomly, no pattern whatsoever. Sometimes it's 12 hours, sometimes it's 2. I'm at a loss where to look. The pf_test_state_icmp() is the only clue I have so far. I have no rules referencing ICMP, all tunables are default. I cleared them just to be sure.
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #1 on:
June 06, 2023, 12:54:47 pm »
Do you have a full backtrace? Is this on latest 23.1.8/9?
And this is since 23.1 or earlier? 22.1 added FreeBSD 13, perhaps since then this was the case...
Cheers,
Franco
Logged
mr_penguin
Newbie
Posts: 8
Karma: 3
Re: Random crashing with pf_test_state_icmp()
«
Reply #2 on:
June 06, 2023, 03:56:54 pm »
I can grab the full backtrace the next time it happens. I have been submitting bug reports as it happens. This is on the latest 23.1.9, and has been happening since at least the 22.1 series, possibly even longer.
Logged
mr_penguin
Newbie
Posts: 8
Karma: 3
Re: Random crashing with pf_test_state_icmp()
«
Reply #3 on:
June 06, 2023, 06:49:30 pm »
Attached are 2 consecutive crash dumps, only minutes apart. At first glance, the stack traces are identical.
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #4 on:
June 07, 2023, 09:56:12 am »
Thanks! Didn't know about previous submissions.
So this is IPv4 traffic indeed and I couldn't find a relevant issue within FreeBSD. There are two choices here: this is a problem in response to TCP/UDP packet or a clean ICMP ping, but I'm leaning towards the former. Not sure how to proceed.
A debug kernel and a core dump might be the best option here.
Cheers,
Franco
Logged
mr_penguin
Newbie
Posts: 8
Karma: 3
Re: Random crashing with pf_test_state_icmp()
«
Reply #5 on:
June 07, 2023, 07:37:08 pm »
Sounds good to me. How do I get a debug kernel?
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #6 on:
June 08, 2023, 12:31:55 pm »
I have built one now but need to test real quick how to get to the core dump. BRB
Cheers,
Franco
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #7 on:
June 08, 2023, 01:12:25 pm »
So here is what to do:
1. Install debug kernel:
# opnsense-update -zkr dbg-23.1.8_5
2. Reboot to activate kernel.
3. Adjust action on panic after bootup:
# ddb script kdb.enter.default="bt; dump; reset"
You can test with the following to see that it picked it up:
# ddb scripts
4. Wait for panic. After a panic there will be a core file here:
# ls /var/crash/vmcore.[0-9]*
It's a mini dump of just over 200 MB. Perhaps you can send me a PM from where I can grab it.
Thanks in advance!
Cheers,
Franco
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #8 on:
June 12, 2023, 09:28:21 am »
Hi,
I've been looking at the odd core dump... here is the null pointer dereference (this time UDP, not ICMP):
#17 0xffffffff8237ed0f in pf_test_state_udp (state=<optimized out>, state@entry=0xfffffe001099b828,
direction=<optimized out>, kif=<optimized out>, kif@entry=0xfffff800245b3a00, m=m@entry=0xfffff801e9409800,
off=20, h=<optimized out>, pd=pd@entry=0xfffffe001099b758) at
/usr/src/sys/netpfil/pf/pf.c:5086
5086 if (PF_ANEQ(pd->src, &nk->addr[pd->sidx], pd->af) ||
(kgdb) list
5081
5082 /* translate source/destination address, if necessary */
5083 if ((*state)->key[PF_SK_WIRE] != (*state)->key[PF_SK_STACK]) {
5084 struct pf_state_key *nk = (*state)->key[pd->didx];
5085
5086
if (PF_ANEQ(pd->src, &
nk
->addr[pd->sidx], pd->af) ||
5087 nk->port[pd->sidx] != uh->uh_sport)
5088 pf_change_ap(m, pd->src, &uh->uh_sport, pd->ip_sum,
5089 &uh->uh_sum, &nk->addr[pd->sidx],
5090 nk->port[pd->sidx], 1, pd->af);
(kgdb) p nk
$10 = (struct pf_state_key *)
0x0
> I have a HA pair setup
This caught my attention skimming through the upper frame is that the state sync via pfsync seems to be incomplete having brought in a lot of dead pointers which causes these code paths to fail that should always have valid data attached.
I know it's much to ask but if you try to disable state sync does the crashing stop?
Cheers,
Franco
Logged
mr_penguin
Newbie
Posts: 8
Karma: 3
Re: Random crashing with pf_test_state_icmp()
«
Reply #9 on:
June 12, 2023, 02:46:43 pm »
Thanks for digging into this. I have disabled state sync on both nodes and will let you know the results.
Logged
mr_penguin
Newbie
Posts: 8
Karma: 3
Re: Random crashing with pf_test_state_icmp()
«
Reply #10 on:
June 13, 2023, 11:53:56 pm »
Well it's hard to prove that a random crash has stopped but we went from multiple crashes a day to 36 hours and counting of uptime with the state sync disabled. It looks like you are onto something.
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #11 on:
June 14, 2023, 08:39:51 am »
Ok, would you mind creating an issue over at
https://bugs.freebsd.org
for FreeBSD 13.1 and let me know which one you created? The pf/pfsync maintainer should take a look at this because I don't know what should be fixed.
Cheers,
Franco
Logged
mr_penguin
Newbie
Posts: 8
Karma: 3
Re: Random crashing with pf_test_state_icmp()
«
Reply #12 on:
June 23, 2023, 12:11:37 am »
Bug opened with FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272153
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #13 on:
June 23, 2023, 08:27:31 am »
Thanks, I dropped another comment there. Let's see what happens.
Cheers,
Franco
Logged
franco
Administrator
Hero Member
Posts: 17661
Karma: 1611
Re: Random crashing with pf_test_state_icmp()
«
Reply #14 on:
June 23, 2023, 12:19:08 pm »
Ok, this is going to take long... If you want in for the ride:
First make sure to update to 23.1.10 and then install the 13.2 debug kernel:
# opnsense-update -zkr dbg-13.2
# opnsense-shell reboot
Restart pfsync and wait for panic. I've modified the crash reporter code so that vmcore files are automatically being emitted when booted from a debug kernel.
If you don't have time for this I understand. I think the upstream policy here is more of a deterrent than anything else.
Cheers,
Franco
Logged
Print
Pages: [
1
]
2
« previous
next »
OPNsense Forum
»
Archive
»
23.1 Legacy Series
»
Random crashing with pf_test_state_icmp()