Hi,
I upgraded today from 24.7 latest to 25.1.1. However now after I upgraded I do now have crashes of the firewall (sudden reboot).
There is no indicator of what is going on, it just crashes.
Log entries:
2025-02-13T23:31:05 Notice kernel Copyright (c) 1992-2023 The FreeBSD Project.
2025-02-13T23:31:05 Notice kernel ---<<BOOT>>---
2025-02-13T23:31:05 Notice syslog-ng syslog-ng starting up; version='4.8.1'
2025-02-13T23:11:10 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '91096''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 91096: No such process'
So you can see that for quite some time nothing happend and then we see the BOOT mesage. There is no power outage or instability or anything.
This is running on a Protectly VP4650 on coreboot with 8GB RAM. I have no RAM shortage as other than the firewall and a UPNP plugin nothing is running on it.
Anyone else experiencing instabilities with either 25.1.1 or with this hardware in particular ?
What is the output for these two commands ?
ls -ltrh /var/crash && df -hT
Nothing to see there ..
root@firewall:/home/clinden # ls -ltrh /var/crash
total 1
-rw-r--r-- 1 root wheel 5B Feb 10 14:48 minfree
root@firewall:/home/clinden # df -hT
Filesystem Type Size Used Avail Capacity Mounted on
zroot/ROOT/default zfs 106G 1.3G 105G 1% /
devfs devfs 1.0K 0B 1.0K 0% /dev
/dev/gpt/efiboot0 msdosfs 260M 1.8M 258M 1% /boot/efi
zroot/tmp zfs 105G 2.2M 105G 0% /tmp
zroot/var/log zfs 105G 664M 105G 1% /var/log
zroot zfs 105G 96K 105G 0% /zroot
zroot/usr/ports zfs 105G 96K 105G 0% /usr/ports
zroot/usr/home zfs 105G 96K 105G 0% /usr/home
zroot/var/crash zfs 105G 96K 105G 0% /var/crash
zroot/var/audit zfs 105G 96K 105G 0% /var/audit
zroot/var/mail zfs 105G 96K 105G 0% /var/mail
zroot/var/tmp zfs 105G 96K 105G 0% /var/tmp
zroot/usr/src zfs 105G 96K 105G 0% /usr/src
devfs devfs 1.0K 0B 1.0K 0% /var/dhcpd/dev
devfs devfs 1.0K 0B 1.0K 0% /var/unbound/dev
/usr/local/lib/python3.11 nullfs 106G 1.3G 105G 1% /var/unbound/usr/local/lib/python3.11
/lib nullfs 106G 1.3G 105G 1% /var/unbound/lib
root@firewall:/home/clinden #
However after the last reboot yesteday it is now quiet. Not sure if that was because of the upgrades?
I did upgrade to 25.1 and then update to 25.1.1 in the next steps.
From the system log:
grep -e "---<<BOOT>>---" system.log
2025-02-13T23:31:05 Notice kernel ---<<BOOT>>---
2025-02-13T23:01:48 Notice kernel ---<<BOOT>>---
2025-02-13T22:57:29 Notice kernel ---<<BOOT>>---
2025-02-13T22:48:22 Notice kernel ---<<BOOT>>---
2025-02-13T22:47:34 Notice kernel ---<<BOOT>>---
So the first three boots are the upgrade and the update (two for the upgrade and one for the update). However then the next two are unexpected, I did not do anything (update or anything) that would run the reboot and also the reset thing stopped after the last one (23:31:05).
I not really knowledgeable about FreeBSD - worked with Linuxes all my life. So I am not sure where even to beginn my search for what happened.
So it was not a kernel panic, as this would have created at least a kernel panic message in any log. So it was either graceful shutdown (should that not be logged anywhere, and if so where?), or just a hardware crash with reboot (kind of a cold start).
Okay, the reboot @23:01:48 was graceful:
2025-02-13T23:01:48 Notice kernel Copyright (c) 1992-2023 The FreeBSD Project.
2025-02-13T23:01:48 Notice kernel ---<<BOOT>>---
2025-02-13T23:01:48 Notice kernel Uptime: 3m59s
2025-02-13T23:01:48 Notice kernel All buffers synced.
2025-02-13T23:01:48 Notice kernel Syncing disks, vnodes remaining... 0 0 0 0 0 done
2025-02-13T23:01:48 Notice kernel Waiting (max 60 seconds) for system process `syncer' to stop...
2025-02-13T23:01:48 Notice kernel Waiting (max 60 seconds) for system process `vnlru' to stop... done
2025-02-13T23:01:48 Notice syslog-ng syslog-ng starting up; version='4.8.1'
2025-02-13T23:01:08 Notice syslog-ng syslog-ng shutting down; version='4.8.1'
2025-02-13T23:00:11 Notice syslog-ng syslog-ng starting up; version='4.8.1'
2025-02-13T23:00:11 Notice syslog-ng syslog-ng shutting down; version='4.8.1'
However the last one and the one at 22:57:29 were not reboots.
But both had a reconfiguration task for unbound (I did this) before the reboot ...
2025-02-13T22:57:29 Notice kernel ---<<BOOT>>---
2025-02-13T22:57:29 Notice syslog-ng syslog-ng starting up; version='4.8.1'
2025-02-13T22:55:35 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:55:35 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:55:31 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '87465''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 87465: No such process'
2025-02-13T22:55:31 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:55:31 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:55:30 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_stop (execute task : unbound_service_stop(1))
2025-02-13T22:55:30 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_stop (1)
2025-02-13T22:55:28 Notice configctl event @ 1739483728.20 exec: system event config_changed response: OK
2025-02-13T22:55:28 Notice configctl event @ 1739483728.20 msg: Feb 13 22:55:28 firewall.home.local config[70202]: config-event: new_config /conf/backup/config-1739483728.2031.xml
2025-02-13T22:55:09 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:55:09 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:54:57 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '4089''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 4089: No such process'
2025-02-13T22:54:57 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:54:57 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:54:57 Notice configctl event @ 1739483696.93 exec: system event config_changed response: OK
2025-02-13T22:54:57 Notice configctl event @ 1739483696.93 msg: Feb 13 22:54:56 firewall.home.local config[71163]: config-event: new_config /conf/backup/config-1739483696.927.xml
2025-02-13T22:54:56 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_stop (execute task : unbound_service_stop(1))
2025-02-13T22:54:56 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_stop (1)
2025-02-13T22:54:29 Notice configctl event @ 1739483669.34 exec: system event config_changed response: OK
2025-02-13T22:54:29 Notice configctl event @ 1739483669.34 msg: Feb 13 22:54:29 firewall.home.local config[71500]: config-event: new_config /conf/backup/config-1739483669.343.xml
The exact same lines before the crash we have @23:31:05 ...
2025-02-13T23:31:05 Notice kernel ---<<BOOT>>---
2025-02-13T23:31:05 Notice syslog-ng syslog-ng starting up; version='4.8.1'
2025-02-13T23:11:10 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '91096''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 91096: No such process'
2025-02-13T23:11:10 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T23:11:10 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T23:11:09 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_stop (execute task : unbound_service_stop(1))
2025-02-13T23:11:09 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_stop (1)
2025-02-13T23:11:08 Notice configctl event @ 1739484668.00 exec: system event config_changed response: OK
2025-02-13T23:11:08 Notice configctl event @ 1739484668.00 msg: Feb 13 23:11:08 firewall.home.local config[47402]: config-event: new_config /conf/backup/config-1739484668.0032.xml
2025-02-13T23:10:05 Notice configctl event @ 1739484605.24 exec: system event config_changed response: OK
2025-02-13T23:10:05 Notice configctl event @ 1739484605.24 msg: Feb 13 23:10:05 firewall.home.local config[47402]: config-event: new_config /conf/backup/config-1739484605.2388.xml
Okay, I was enjoying the fun too early.
Firewall crashed again.
<13>1 2025-02-14T11:32:22+01:00 firewall.home.local dhcp6c 95245 - [meta sequenceId="1"] dhcp6c_script: RENEW on pppoe0 executing
<85>1 2025-02-14T14:30:40+01:00 firewall.home.local sudo 49661 - [meta sequenceId="1"] clinden : TTY=pts/0 ; PWD=/home/clinden ; USER=root ; COMMAND=/bin/tcsh
<45>1 2025-02-14T16:41:29+01:00 firewall.home.local syslog-ng 12360 - [meta sequenceId="1"] syslog-ng starting up; version='4.8.1'
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="2"] ---<<BOOT>>---
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="3"] Copyright (c) 1992-2023 The FreeBSD Project.
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="4"] Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="5"] The Regents of the University of California. All rights reserved.
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="6"] FreeBSD is a registered trademark of The FreeBSD Foundation.
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="7"] FreeBSD 14.2-RELEASE-p1 stable/25.1-n269632-cc316253c68 SMP amd64
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="8"] FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67)
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="9"] VT(vga): resolution 640x480
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="10"] CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (2100.00-MHz K8-class CPU)
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="11"] Origin="GenuineIntel" Id=0xa0661 Family=0x6 Model=0xa6 Stepping=1
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="12"] Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="13"] Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="14"] AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="15"] AMD Features2=0x121<LAHF,ABM,Prefetch>
This time, no indication nothing.
root@firewall:/var/log/system # ls -ltr /var/crash
total 1
-rw-r--r-- 1 root wheel 5 Feb 10 14:48 minfree
root@firewall:/var/log/system #
root@firewall:/var/log/system # df -hT
Filesystem Type Size Used Avail Capacity Mounted on
zroot/ROOT/default zfs 106G 1.3G 105G 1% /
devfs devfs 1.0K 0B 1.0K 0% /dev
/dev/gpt/efiboot0 msdosfs 260M 1.8M 258M 1% /boot/efi
zroot/tmp zfs 105G 1.3M 105G 0% /tmp
zroot/var/log zfs 105G 667M 105G 1% /var/log
zroot zfs 105G 96K 105G 0% /zroot
zroot/var/audit zfs 105G 96K 105G 0% /var/audit
zroot/usr/home zfs 105G 96K 105G 0% /usr/home
zroot/var/crash zfs 105G 96K 105G 0% /var/crash
zroot/var/mail zfs 105G 96K 105G 0% /var/mail
zroot/usr/ports zfs 105G 96K 105G 0% /usr/ports
zroot/var/tmp zfs 105G 96K 105G 0% /var/tmp
zroot/usr/src zfs 105G 96K 105G 0% /usr/src
devfs devfs 1.0K 0B 1.0K 0% /var/dhcpd/dev
devfs devfs 1.0K 0B 1.0K 0% /var/unbound/dev
/usr/local/lib/python3.11 nullfs 106G 1.3G 105G 1% /var/unbound/usr/local/lib/python3.11
/lib nullfs 106G 1.3G 105G 1% /var/unbound/lib
root@firewall:/var/log/system #
So no new core dump, no filesystem at all in any issue, no panic, just restart - like if someone pushed the reset button (no it is sitting right next to me) or pulled the power (no even not that).
So I really struggle to understand on how to troubleshoot the system now.
Also I am unclear on what changed, and if the update could have anything to do with a firmware requirement or some kind of reset.
And it crashed within minutes again. I switched it back to an older device I have (good to have a backup). But now I am struggeling on how to really run a burn-in test on the firewall hardware (not with opnsense). Any good advise?
Quote from: SerErris on February 14, 2025, 05:08:19 PMAnd it crashed within minutes again. I switched it back to an older device I have (good to have a backup). But now I am struggeling on how to really run a burn-in test on the firewall hardware (not with opnsense). Any good advise?
That's easy. PassMark's MemTest86 (https://www.memtest86.com/) and GIMPS (mprime/Prime95) (https://www.mersenne.org/). I don't know of a (recent) boot image with mprime - I boot System Rescue (which I always have flying around) and run mprime off of a FAT32 stick (which I also always have flying around, for firmware images). You could also boot the FreeBSD 14.2 installer and run mprime under that for a potentially (if not likely) more applicable test. Be aware that mprime writes statistics to (its) disk by default - R/W media is required (I haven't looked into suppressing this behavior). Edit: Pointed out because I always sync the filesystems before shutting down the OS. Don't want to mess up my stick.
Thanks for the tips.
I researched it a little bit and now went to stress-ng. It actually hammers the system pretty much (memory and CPU) and that is great.
So now I let it run for 24 hours and see.
BTW: I have replaced memory that was in the system (Kingston CBD26D4S9S8K1C) and SSD for now. Both were delivered as part of the box, but I bought some compatible GSkill memory from the Protectli supported list some month ago and now put it in, as well as a samsung 980 Pro instead of the noname protectli SSD... I would assume that if I would get disk errors, I would get kernel panics or something to see ... bot nut just a reboot. So I assume the issue anyhow anywhere between CPU and Memory.
What I know already is that the system is pretty cool under OPNsense (35°C on the CPU). So heat is probably not an issue. Lets see if the system survives 24 hours of stress-NG memory hammering and CPU hammering, then I think it is okay. Then lets run OPNSense on it again.
So will report back here.
BTW: Protectli is of no help. This is just a pretty expensive Firewall build that is actually making problems on the hardware end.
Okay I ran the stress-ng for 24 hours which had all the memory and all the swap space in use as well as all 4 cores/8 threads. So the system ran through without any issue.
I am really struggeling to understand why it crashes with OPNsense and not with linux Ubuntu desktop under full load.
Did you install the CPU microcode updates?
No, I did not. Not actively - maybe as part of the upgrade or part of ubuntu automatically, but nothing I did myself (btw never did on any machine in my life).
I switched now both - memory and SSD. The original SSD from protectly is "flimsy" at least and the RAM is Kingston (from support list).
I now have a trusted (reused) Samsung 980 Pro 1TB nvme SSD in place as well as GSkill 32GB RAM. A little bit of overkill, however it is now quiet ... so press keep fingers crossed, that this is now stable.
Quote from: SerErris on February 16, 2025, 05:49:28 PMNo, I did not. Not actively - maybe as part of the upgrade or part of ubuntu automatically, but nothing I did myself (btw never did on any machine in my life).
These updates are not persistent but installed to the CPU at every reboot. I recommend installing the matching plugin (Intel or AMD) depending on your CPU. That and one reboot is all that is necessary.
HTH,
Patrick
Install this from CLI or find it in the GUI, then reboot.
pkg install os-cpu-microcode-intel-1.1
Okay, thanks will check this. However if we would run into a CPU issue I would expect a kernal panic message at least. Also I do not really understand what the difference is if it is installed on the fly or persistent.
Unfortunately, we have the same problem.
Our OPNsense appliance runs on physical hardware in an HA construct.
Since the upgrade to 25.1.x, both devices reboot sporadically.
The primary appliance has already been completely reinstalled to rule this out as the cause.
ZenArmor and Suricata have also been deactivated to rule out this as the cause of the reboots.
/var/crash/info.0:
Dump header from device: /dev/ada0p3
Architecture: amd64
Architecture Version: 4
Dump Length: 155648
Blocksize: 512
Compression: none
Dumptime: 2025-03-20 09:56:08 +0100
Hostname: FW221201.karcoma.net
Magic: FreeBSD Text Dump
Version String: FreeBSD 14.2-RELEASE-p2 stable/25.1-n269701-7c59d89f8cd SMP
Panic String: page fault
Dump Parity: 1955918881
Bounds: 0
Dump Status: good
I hope someone can help.
Kind regards,
Dominique
Again, same question: Microcode updates applied?
Quote from: meyergru on March 20, 2025, 10:38:34 AMAgain, same question: Microcode updates applied?
Yes, cpu-microcode-intel is installed.