1
23.1 Legacy Series / Random crashing with pf_test_state_icmp()
« on: June 05, 2023, 04:23:01 pm »
Hi,
I have been using OPNsense for several years now, and at some point in the last year or so I started to get random crashes.
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x0
fault code = supervisor read data, page not present
The stack trace always ends at pf_test_state_icmp().
I suspected hardware issues, so I bought a completely new system, installed OPNsense, and restored my config. Same issue. Seems to point to a software issue, but I can't figure out where to start looking.
I have a HA pair setup, with the backup instance on VMware. Notably, that one doesn't seem to have the crash problem.
The primary was a Qotom Q355-G4, and has been replaced with https://www.aliexpress.us/item/3256804355685285.html configured with 8GB RAM, Intel N6005. No hardware has been shared between them.
My plugins are:
os-acme-client
os-chrony
os-etpro-telemetry
os-mdns-repeater
os-smart
os-theme-vicuna
os-vnstat
os-wireguard
I have a pair of IPsec tunnels, and a handful of Wireguard clients. I am using CARP on the WAN interface, and all of the internal interfaces. The interfaces are configured as LAGGs, with only 1 interface each (to provide failover compatibility with the VMware instance)
I have Hybrid Outbound NAT configured to set the CARP WAN address as the source for my internal networks
No unusual rules, no policy based routing. I used to have Daul WAN setup, but no longer have Dual WANs. That interface is disabled. I also used to have IPv6 configured, but no longer have IPv6 on my WAN. I have a he.net gif tunnel setup, but is disabled.
The crashes happen randomly, no pattern whatsoever. Sometimes it's 12 hours, sometimes it's 2. I'm at a loss where to look. The pf_test_state_icmp() is the only clue I have so far. I have no rules referencing ICMP, all tunables are default. I cleared them just to be sure.
I have been using OPNsense for several years now, and at some point in the last year or so I started to get random crashes.
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x0
fault code = supervisor read data, page not present
The stack trace always ends at pf_test_state_icmp().
I suspected hardware issues, so I bought a completely new system, installed OPNsense, and restored my config. Same issue. Seems to point to a software issue, but I can't figure out where to start looking.
I have a HA pair setup, with the backup instance on VMware. Notably, that one doesn't seem to have the crash problem.
The primary was a Qotom Q355-G4, and has been replaced with https://www.aliexpress.us/item/3256804355685285.html configured with 8GB RAM, Intel N6005. No hardware has been shared between them.
My plugins are:
os-acme-client
os-chrony
os-etpro-telemetry
os-mdns-repeater
os-smart
os-theme-vicuna
os-vnstat
os-wireguard
I have a pair of IPsec tunnels, and a handful of Wireguard clients. I am using CARP on the WAN interface, and all of the internal interfaces. The interfaces are configured as LAGGs, with only 1 interface each (to provide failover compatibility with the VMware instance)
I have Hybrid Outbound NAT configured to set the CARP WAN address as the source for my internal networks
No unusual rules, no policy based routing. I used to have Daul WAN setup, but no longer have Dual WANs. That interface is disabled. I also used to have IPv6 configured, but no longer have IPv6 on my WAN. I have a he.net gif tunnel setup, but is disabled.
The crashes happen randomly, no pattern whatsoever. Sometimes it's 12 hours, sometimes it's 2. I'm at a loss where to look. The pf_test_state_icmp() is the only clue I have so far. I have no rules referencing ICMP, all tunables are default. I cleared them just to be sure.