Firewall Frequently Locking Up, Requiring Hard Reboot

milkywaygoodfellas · August 14, 2022, 01:45:45 AM

Every so often, up to multiple times per day, my firewall appliance locks up and requires a hard reboot to restore services and internet connectivity.

So far, I have been unable to find any logs or crash dumps that would help me isolate the issue outside of one time, which I did submit via the web interface.

I have no idea where to start. Can someone point me in the right direction to troubleshoot this issue? At this point I'm not sure if it's hardware or software.

I'm running it on a KingNovy fanless PC with 6x Intel I225-V, a Celeron N5105, 16 GB of RAM, and a 256 GB NVMe drive.

Demusman · August 14, 2022, 02:24:14 AM

The start would be connecting to the console when it's locked up and seeing what it says.

milkywaygoodfellas · August 14, 2022, 05:01:53 AM

I'd love to, but I can't even SSH into it when it happens.

axsdenied · August 14, 2022, 05:56:52 AM

I think he means locally on the device. Not remoting into it ;)

Demusman · August 14, 2022, 01:36:01 PM

Quote from: milkywaygoodfellas on August 14, 2022, 05:01:53 AM
I'd love to, but I can't even SSH into it when it happens.

Key word, "console"

milkywaygoodfellas · August 15, 2022, 05:13:59 PM

I managed to retrieve these crash dumps. Briefly going through them, I'm starting to suspect overheating or other hardware issues?

axsdenied · August 15, 2022, 05:29:10 PM

Looks like the panic was caused by "pfctl". You doing packet inspection of any kind? Perhaps chocking session states?

milkywaygoodfellas · August 15, 2022, 05:49:33 PM

Quote from: axsdenied on August 15, 2022, 05:29:10 PM
Looks like the panic was caused by "pfctl". You doing packet inspection of any kind? Perhaps chocking session states?

Just the defaults... IDS was enabled in IPS mode but with no rules downloaded. I did not modify any of those settings from the base install.

franco · August 15, 2022, 08:15:36 PM

For readability:

Code Select

db:0:kdb.enter.default>  show pcpu
cpuid        = 0
dynamic pcpu = 0xfc0f40
curthread    = 0xfffffe0138c28720: pid 3489 tid 102014 critnest 1 "pfctl"
curpcb       = 0xfffffe0138c28c30
fpcurthread  = 0xfffffe0138c28720: pid 3489 "pfctl"
idlethread   = 0xfffffe00207933a0: tid 100003 "idle: cpu0"
self         = 0xffffffff82c10000
curpmap      = 0xfffffe011668f518
tssp         = 0xffffffff82c10384
rsp0         = 0xfffffe0118fea000
kcr3         = 0x351ae2000
ucr3         = 0x16fe6d000
scr3         = 0x16fe6d000
gs32p        = 0xffffffff82c10404
ldt          = 0xffffffff82c10444
tss          = 0xffffffff82c10434
curvnet      = 0xfffff80001202dc0
db:0:kdb.enter.default>  bt
Tracing pid 3489 tid 102014 td 0xfffffe0138c28720
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0118fe93c0
vpanic() at vpanic+0x1b0/frame 0xfffffe0118fe9410
panic() at panic+0x43/frame 0xfffffe0118fe9470
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0118fe94d0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0118fe9530
calltrap() at calltrap+0x8/frame 0xfffffe0118fe9530
--- trap 0xc, rip = 0xffffffff80debe14, rsp = 0xfffffe0118fe9600, rbp = 0xfffffe0118fe9620 ---
rn_walktree() at rn_walktree+0x64/frame 0xfffffe0118fe9620
pfr_get_addrs() at pfr_get_addrs+0x219/frame 0xfffffe0118fe9680
pfioctl() at pfioctl+0x23be/frame 0xfffffe0118fe9b50
devfs_ioctl() at devfs_ioctl+0xc6/frame 0xfffffe0118fe9ba0
vn_ioctl() at vn_ioctl+0x1a4/frame 0xfffffe0118fe9cb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe0118fe9cd0
kern_ioctl() at kern_ioctl+0x25b/frame 0xfffffe0118fe9d40
sys_ioctl() at sys_ioctl+0xf1/frame 0xfffffe0118fe9e00
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe0118fe9f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0118fe9f30
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8012446da, rsp = 0x7fffffffdc38, rbp = 0x7fffffffe0d0 ---

I haven't seen this before but if it doesn't happen on 22.1 it should be easy to find the bad commit.

This is new for 22.7, right?

Cheers,
Franco

milkywaygoodfellas · August 15, 2022, 09:02:18 PM

Quote from: franco on August 15, 2022, 08:15:36 PM
For readability:

Code Select Expand
db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0xfc0f40 curthread = 0xfffffe0138c28720: pid 3489 tid 102014 critnest 1 "pfctl" curpcb = 0xfffffe0138c28c30 fpcurthread = 0xfffffe0138c28720: pid 3489 "pfctl" idlethread = 0xfffffe00207933a0: tid 100003 "idle: cpu0" self = 0xffffffff82c10000 curpmap = 0xfffffe011668f518 tssp = 0xffffffff82c10384 rsp0 = 0xfffffe0118fea000 kcr3 = 0x351ae2000 ucr3 = 0x16fe6d000 scr3 = 0x16fe6d000 gs32p = 0xffffffff82c10404 ldt = 0xffffffff82c10444 tss = 0xffffffff82c10434 curvnet = 0xfffff80001202dc0 db:0:kdb.enter.default> bt Tracing pid 3489 tid 102014 td 0xfffffe0138c28720 kdb_enter() at kdb_enter+0x37/frame 0xfffffe0118fe93c0 vpanic() at vpanic+0x1b0/frame 0xfffffe0118fe9410 panic() at panic+0x43/frame 0xfffffe0118fe9470 trap_fatal() at trap_fatal+0x385/frame 0xfffffe0118fe94d0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0118fe9530 calltrap() at calltrap+0x8/frame 0xfffffe0118fe9530 --- trap 0xc, rip = 0xffffffff80debe14, rsp = 0xfffffe0118fe9600, rbp = 0xfffffe0118fe9620 --- rn_walktree() at rn_walktree+0x64/frame 0xfffffe0118fe9620 pfr_get_addrs() at pfr_get_addrs+0x219/frame 0xfffffe0118fe9680 pfioctl() at pfioctl+0x23be/frame 0xfffffe0118fe9b50 devfs_ioctl() at devfs_ioctl+0xc6/frame 0xfffffe0118fe9ba0 vn_ioctl() at vn_ioctl+0x1a4/frame 0xfffffe0118fe9cb0 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe0118fe9cd0 kern_ioctl() at kern_ioctl+0x25b/frame 0xfffffe0118fe9d40 sys_ioctl() at sys_ioctl+0xf1/frame 0xfffffe0118fe9e00 amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe0118fe9f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0118fe9f30 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8012446da, rsp = 0x7fffffffdc38, rbp = 0x7fffffffe0d0 ---

I haven't seen this before but if it doesn't happen on 22.1 it should be easy to find the bad commit.

This is new for 22.7, right?

Cheers,
Franco

Yeah, never had this problem on 22.1 before. I disabled IPS/IDS entirely and it seems to have greatly helped the stability - it was crashing multiple times a day today and yesterday and since turning off Intrustion Detection in services, it hasn't crashed again (yet).

milkywaygoodfellas · August 17, 2022, 03:30:30 PM

Just a quick update - since disabling IDS/IPS in my last post, the firewall has not crashed again as of this reply.

axsdenied · August 17, 2022, 06:04:55 PM

Did you have any hardware offloading enabled? i.e. CRC, TSO, LRO or VLAN?

milkywaygoodfellas · August 17, 2022, 06:17:30 PM

Quote from: axsdenied on August 17, 2022, 06:04:55 PM
Did you have any hardware offloading enabled? i.e. CRC, TSO, LRO or VLAN?

Nope, all disabled.

And I spoke too soon... another crash dump some time yesterday apparently. This time, however, the firewall rebooted itself instead of staying locked up until I power cycled it.

Caused by PHP this time, apparently?

axsdenied · August 17, 2022, 07:55:19 PM

Given the change in behavior, this is feeling more like potentially a hardware issue, but it's still not remotely clear.

To rule that out, are you able to go back to 22.1 and test?

Otherwise potentially check CPU temps, or setup alerts.
You could also, just for good measure, run a memtest on the box?

Historically, for me, it's rarely been memory issues however it WAS 1 out of the 99 times. And that 1 time, drove me nuts in troubleshooting before I discovered the issue ;)

milkywaygoodfellas · August 18, 2022, 12:54:58 AM

Quote from: axsdenied on August 17, 2022, 07:55:19 PM
Given the change in behavior, this is feeling more like potentially a hardware issue, but it's still not remotely clear.

To rule that out, are you able to go back to 22.1 and test?

Otherwise potentially check CPU temps, or setup alerts.
You could also, just for good measure, run a memtest on the box?

Historically, for me, it's rarely been memory issues however it WAS 1 out of the 99 times. And that 1 time, drove me nuts in troubleshooting before I discovered the issue ;)

I can try a live disk of 22.1 to see, but I made some tweaks and it was running stable again so I turned IDS/IPS back on and it almost immediately locked up with no crash dump, same as before. Turned it back off and so far so good, but it's only been a couple of hours.

Firewall Frequently Locking Up, Requiring Hard Reboot

milkywaygoodfellas

August 14, 2022, 01:45:45 AM

Demusman

August 14, 2022, 02:24:14 AM #1

milkywaygoodfellas

August 14, 2022, 05:01:53 AM #2

axsdenied

August 14, 2022, 05:56:52 AM #3

Demusman

August 14, 2022, 01:36:01 PM #4

milkywaygoodfellas

August 15, 2022, 05:13:59 PM #5

axsdenied

August 15, 2022, 05:29:10 PM #6

milkywaygoodfellas

August 15, 2022, 05:49:33 PM #7

franco

August 15, 2022, 08:15:36 PM #8

milkywaygoodfellas

August 15, 2022, 09:02:18 PM #9

milkywaygoodfellas

August 17, 2022, 03:30:30 PM #10

axsdenied

August 17, 2022, 06:04:55 PM #11

milkywaygoodfellas

August 17, 2022, 06:17:30 PM #12 Last Edit: August 17, 2022, 06:32:04 PM by milkywaygoodfellas

axsdenied

August 17, 2022, 07:55:19 PM #13

milkywaygoodfellas

August 18, 2022, 12:54:58 AM #14