Business Edition Kernel Panic

Started by opn_leo, January 19, 2025, 02:35:05 PM

Previous topic - Next topic
I own a Opnsense DEC-2750.  I was on a previous version of Opnsense Business Edition for about a year without issues.  I upgraded to  24.10.1 a couple weeks ago, and started getting daily kernel panics and crashes/reboots.  Some of the crashes required power cycling the unit to recover.

After reading some forum posts, I tried reinstalling the kernel and syslog-ng.  This seemed to fix the issue for about a week.  But just had another hard crash requiring a power cycle.  I connected serial console to monitor the DEC-2750, and it's a solid stream of these messages until I reboot.

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x1
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c0af1
stack pointer           = 0x28:0xfffffe000f6bc530
frame pointer           = 0x28:0xfffffe000f6bc630
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 31708 (python3.11)
kernel trap 12 with interrupts disabled

Anyone have any suggestions.  Sent the logs to Opnsense via the issue reporting dialog.  This is the only firewall I use at home and dropping internet every 12 hours (and having to power cycle) is a real bummer. 

I'm on the 670 on the latest business version with 42 days uptime

Hopefully someone can help find the issue

Same here, 24.10.1 on a DEC750 and at this moment 53 days uptime ...

Quote from: opn_leo on January 19, 2025, 02:35:05 PM.....
Anyone have any suggestions.  Sent the logs to Opnsense via the issue reporting dialog.  This is the only firewall I use at home and dropping internet every 12 hours (and having to power cycle) is a real bummer. 

Can you post the output of this command please:

  ls -ltrh /var/crash

Quote from: newsense on January 20, 2025, 02:31:29 AM
Quote from: opn_leo on January 19, 2025, 02:35:05 PM.....
Anyone have any suggestions.  Sent the logs to Opnsense via the issue reporting dialog.  This is the only firewall I use at home and dropping internet every 12 hours (and having to power cycle) is a real bummer. 

Can you post the output of this command please:

  ls -ltrh /var/crash


That is an empty directory


Disk full ? A kernel crash should leave something in that dir otherwise


In what environment is this unit deployed? It's a rack mount unit so I imagine it's installed in a rack?

Is the grounding pin of the appliance connected to the rack?
Is there a UPS used?
Are there any temperature issues? Fan working?
Hardware:
DEC740

Quote from: Monviech (Cedrik) on January 20, 2025, 03:49:36 PMIn what environment is this unit deployed? It's a rack mount unit so I imagine it's installed in a rack?

Rack mounted

QuoteIs the grounding pin of the appliance connected to the rack?

Yes, grounding pin used and powered by properly grounded rack mount PDU

QuoteIs there a UPS used?

Yes, PDU is plugged into UPS.  All other devices on UPS are functioning fine

QuoteAre there any temperature issues? Fan working?

Yes, fan is working.  No issues with temperature of the DEC2750, I check the temperature health via GUI and everything is very stable.