Upgrade from 23.1->23.1.3 kernel panic/crashing

Started by CrazyBebop, March 14, 2023, 12:45:28 PM

Previous topic - Next topic
Hello!,

I just recently bought one of those mini PC firewalls off of Amazon, I was able to get 23.1 installed and imported config and it works perfectly until I attempt to upgrade to 23.1.3, after a successful upgrade message, the system reboots successfully.


After a couple of seconds, even after being able to reach the opnsense web GUI, I see a bunch of text fly down the little monitor I use for the firewall and immediately reboot into the "FreeBSD" screen which boots the system backup, which after a few seconds crashes again, and loops on and on.

I don't think it's a failing disk, because it's literally a brand new Nvme drive unless the drive itself is faulty right away, but I doubt it, because even now, without updating, everything is running perfectly, and now has been for 7 hours... just when I update, everything breaks..

Does anyone have any suggestions?


Film it with your mobile phone and then post some evidence, i.e. the kernel panic message that is probably occuring right before the reboot.

Without more information it is simply impossible to tell.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: pmhausen on March 14, 2023, 10:42:32 PM
Film it with your mobile phone and then post some evidence, i.e. the kernel panic message that is probably occuring right before the reboot.

Without more information it is simply impossible to tell.

I'll do that tonight, thanks.

Hello, I am experiencing a very similar Problem.
mine behaves like this:


After updating to 23.1.3_4  (Fresh install, only restored config from backup):

  • PPPoE does work.
  • Kernel Panic everytime I try to Update/install anything (gui + cli) , sometimes opening the System > Firmware > Status page is enough.

  • The system will usually crash after a couple of minutes, even without checking for updates.
  • Submitting a crash report has also caused the system to crash.
  • Switching to dev Branch causes an error, but still works and even allows me to install Plugins, but still crashes irregularly or when searching for updates.
    • Error when Updating to dev branch : [Screenshot_20230315_194431.png]

It might be worth noting that this only happens when connected to WAN, is has never crashed on me when there was no Internet connection.
Proof Video: https://youtu.be/xvG1fJo8QVg
Im currently running version 23.1, which seems to be stable, but does not allow me to install Updates (Update required)

Hi,

I have this to, it was also on 23.1.2 (check for updates -> kernel crash and reboot) its headless so hard to know more.
After a reboot and immedially update i could update to 23.1.3.
Now the reboots happen without interaction,  i heared the "boot up sound" yestern 11pm and today in the morning around 5:45am.

I follow this thread and try to get / find a serial or other console to get more output if needed.

Ronny

March 16, 2023, 10:58:48 PM #6 Last Edit: March 16, 2023, 11:03:14 PM by silverspy18
My apologies, I posted to the wrong thread. Please delete if possible.

I have managed to find a Workaround which stops the system from crashing until the next reboot.
The problem seems to be caused by driver issues with the new Kernel (educated guess). Booting with the old kernel has completely solved my Issue.

Workaround:


    • Turn on your machine.
    • Immediatly hit space repeatedly in order to pause the boot process.
    • Press K in to change the kernel to the old one. (The text at number 6. should turn blue)
    • Press B to boot.
The system should now run stable.
This is not permanent. OPNsense will boot the newer kernel on reboot
If anyone knows how to make this permanent, I would greatly appreciate a reply.


Other things I have tried (just want to mention for documentation purposes):

  • Running Memtest86+ for 20+h (37 passes) found 0 errors.
  • Switching out the Hard Drive for a brand new one did not Help.

Hi,

thanks for this information,

I also checked my sata ssd, as i suspected,that that was the case, but it also health, no issues here to see. I use a APU2C4 with Bios v4.17.0.3 if that is relevant.

Fot the moment , i live with the reboots, as its my 2nd internet Link, will have to move in the next weeks and then i have to check in detail.

@opnsense any hints or infos for us here ?

Ronny


March 21, 2023, 08:11:52 AM #9 Last Edit: March 21, 2023, 08:15:26 AM by meyergru
What is your common factor? I226-V as NIC? This does not apply for the APU2C4, so: PPPoE on WAN?

The I226-V FreeBSD drivers are fairly fresh, pfSense does not even support those yet.

And after several I225 generations ridden with problems, there is plenty of indication on several other platforms (Windows, Linux) right now that I226-V might be just as unstable hardware-wise, just google for "I226-V connection drop".
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote from: meyergru on March 21, 2023, 08:11:52 AM
What is your common factor? I226-V as NIC? This does not apply for the APU2C4, so: PPPoE on WAN?

Yes,  PPPoE on the WAN side for me. Network Ports have the "Intel(R) I210" for me.

And this was stable as hell. For me it started with 23.1.2 and everytime i did a update check. Now with 23.1.3 it happens more often.

Its headless, so i really have to search to get my serial cable out of my "big box"  to see more. But it is really a crash with reboot not a simple "connection drop".

Ronny

March 21, 2023, 10:08:55 AM #11 Last Edit: March 21, 2023, 10:10:34 AM by meyergru
What is a connection drop on one OS may as well manifest as a kernel crash on another, just saying.

However, there are few reports of kernel crashes just because of using PPPoE. I have three OpnSense 23.1.3 installations running over it and had no problems at all.

I wonder if other reports share a common factor in hardware where it is more likely to have crashes than with a user-level process like mpd5. For now, information about probable causes in this thread is scarce (e.g. "those mini PC firewalls off of Amazon" use either I210, I211, I225 or I226 or even Realtek) and only shows common symptoms (i.e. kernel crashes).
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

I use a USB to Ethernet Adapter for my WAN interface.
Im also using PPPoE on my WAN interface.

root@OPNsense:~ # sysctl -a | grep -E 'dev.(rgephy|em|ure).*.%desc:'
dev.rgephy.0.%desc: RTL8251/8153 1000BASE-T media interface
dev.ure.0.%desc: Realtek USB 10/100/1000 LAN, class 0/0, rev 3.00/31.00, addr 3
dev.em.0.%desc: Intel(R) I219-LM SPT-H(2)


The NIC inside of the adapter seems to be made by Realtek.
My LAN interface is a Intel(R) I219-LM.
Prior to the update it was also running 100% stable for me.

I'm not using one of those "mini PC firewalls off of Amazon" I have a small form factor PC by HP (Elitedesk 800 G2; Intel i3-6100).

I am getting a similar reboot loop on OPNsense 23.1.4_1-amd64, which is running within Proxmox with NICs passed through (Mellanox ConnectX-3 and Intel I219). This system has been running fine for nearly a year, and I cannot see any obvious underlying hardware issues. The PC itself is an Intel i5-7600 (consumer PC, not an all-in-one).

Observing the crash in the proxmox console, it happens too fast to read any messages (goes from console prompt to 'Guest not running'). I don't know opnsense/FreeBSD well enough to be able to find the relevant log files (/var/log/dmesg shows nothing of interest).

Happy to help debug, just tell me where to look (log files etc).

April 06, 2023, 12:27:54 AM #14 Last Edit: April 06, 2023, 12:37:03 AM by steely.wing
I have this issue too, I installed on a mini PC, after setup the WAN using a VLAN, wait several minutes, OPNsense will crash and reboot, after reboot several minutes, it will reboot again.
If I down the WAN interface, it doesn't have crash in several hours.
I have done some memtest and disk check, there are no issue.
I can't find this issue report in Github, may be we should report in Github issue?