After Update to 25.1.1 unexpected firewall restarts without any log entry

Started by SerErris, February 13, 2025, 11:44:54 PM

Previous topic - Next topic
Hi,

I upgraded today from 24.7 latest to 25.1.1. However now after I upgraded I do now have crashes of the firewall (sudden reboot).

There is no indicator of what is going on, it just crashes.

Log entries:
2025-02-13T23:31:05   Notice   kernel   Copyright (c) 1992-2023 The FreeBSD Project.   
2025-02-13T23:31:05   Notice   kernel   ---<<BOOT>>---   
2025-02-13T23:31:05   Notice   syslog-ng   syslog-ng starting up; version='4.8.1'   
2025-02-13T23:11:10   Error   opnsense   /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '91096''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 91096: No such process'

So you can see that for quite some time nothing happend and then we see the BOOT mesage. There is no power outage or instability or anything.

This is running on a Protectly VP4650 on coreboot with 8GB RAM. I have no RAM shortage as other than the firewall and a UPNP plugin nothing is running on it.

Anyone else experiencing instabilities with either 25.1.1 or with this hardware in particular ?

What is the output for these two commands ?

ls -ltrh /var/crash && df -hT

Nothing to see there ..

root@firewall:/home/clinden # ls -ltrh /var/crash
total 1
-rw-r--r--  1 root wheel    5B Feb 10 14:48 minfree


root@firewall:/home/clinden # df -hT
Filesystem                Type      Size    Used  Avail Capacity  Mounted on
zroot/ROOT/default        zfs        106G    1.3G    105G    1%    /
devfs                      devfs      1.0K      0B    1.0K    0%    /dev
/dev/gpt/efiboot0          msdosfs    260M    1.8M    258M    1%    /boot/efi
zroot/tmp                  zfs        105G    2.2M    105G    0%    /tmp
zroot/var/log              zfs        105G    664M    105G    1%    /var/log
zroot                      zfs        105G    96K    105G    0%    /zroot
zroot/usr/ports            zfs        105G    96K    105G    0%    /usr/ports
zroot/usr/home            zfs        105G    96K    105G    0%    /usr/home
zroot/var/crash            zfs        105G    96K    105G    0%    /var/crash
zroot/var/audit            zfs        105G    96K    105G    0%    /var/audit
zroot/var/mail            zfs        105G    96K    105G    0%    /var/mail
zroot/var/tmp              zfs        105G    96K    105G    0%    /var/tmp
zroot/usr/src              zfs        105G    96K    105G    0%    /usr/src
devfs                      devfs      1.0K      0B    1.0K    0%    /var/dhcpd/dev
devfs                      devfs      1.0K      0B    1.0K    0%    /var/unbound/dev
/usr/local/lib/python3.11  nullfs    106G    1.3G    105G    1%    /var/unbound/usr/local/lib/python3.11
/lib                      nullfs    106G    1.3G    105G    1%    /var/unbound/lib
root@firewall:/home/clinden #

However after the last reboot yesteday it is now quiet. Not sure if that was because of the upgrades?

I did upgrade to 25.1 and then update to 25.1.1 in the next steps.

From the system log:
grep -e "---<<BOOT>>---" system.log
2025-02-13T23:31:05     Notice  kernel   ---<<BOOT>>---
2025-02-13T23:01:48     Notice  kernel   ---<<BOOT>>---
2025-02-13T22:57:29     Notice  kernel   ---<<BOOT>>---
2025-02-13T22:48:22     Notice  kernel   ---<<BOOT>>---
2025-02-13T22:47:34     Notice  kernel   ---<<BOOT>>---

So the first three boots are the upgrade and the update (two for the upgrade and one for the update). However then the next two are unexpected, I did not do anything (update or anything) that would run the reboot and also the reset thing stopped after the last one (23:31:05).

I not really knowledgeable about FreeBSD - worked with Linuxes all my life. So I am not sure where even to beginn my search for what happened.

So it was not a kernel panic, as this would have created at least a kernel panic message in any log. So it was either graceful shutdown (should that not be logged anywhere, and if so where?), or just a hardware crash with reboot (kind of a cold start).

Okay, the reboot @23:01:48 was graceful:
2025-02-13T23:01:48     Notice  kernel   Copyright (c) 1992-2023 The FreeBSD Project.
2025-02-13T23:01:48     Notice  kernel   ---<<BOOT>>---
2025-02-13T23:01:48     Notice  kernel   Uptime: 3m59s
2025-02-13T23:01:48     Notice  kernel   All buffers synced.
2025-02-13T23:01:48     Notice  kernel   Syncing disks, vnodes remaining... 0 0 0 0 0 done
2025-02-13T23:01:48     Notice  kernel   Waiting (max 60 seconds) for system process `syncer' to stop...
2025-02-13T23:01:48     Notice  kernel   Waiting (max 60 seconds) for system process `vnlru' to stop... done
2025-02-13T23:01:48     Notice  syslog-ng        syslog-ng starting up; version='4.8.1'
2025-02-13T23:01:08     Notice  syslog-ng        syslog-ng shutting down; version='4.8.1'
2025-02-13T23:00:11     Notice  syslog-ng        syslog-ng starting up; version='4.8.1'
2025-02-13T23:00:11     Notice  syslog-ng        syslog-ng shutting down; version='4.8.1'

However the last one and the one at 22:57:29 were not reboots.

But both had a reconfiguration task for unbound (I did this) before the reboot ...

2025-02-13T22:57:29     Notice  kernel   ---<<BOOT>>---
2025-02-13T22:57:29     Notice  syslog-ng        syslog-ng starting up; version='4.8.1'
2025-02-13T22:55:35     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:55:35     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:55:31     Error   opnsense         /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '87465''(pid:/var/run/unbound.pid)  returned exit code '1', the output was 'kill: 87465: No such process'
2025-02-13T22:55:31     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:55:31     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:55:30     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_stop (execute task : unbound_service_stop(1))
2025-02-13T22:55:30     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_stop (1)
2025-02-13T22:55:28     Notice  configctl        event @ 1739483728.20 exec: system event config_changed response: OK
2025-02-13T22:55:28     Notice  configctl        event @ 1739483728.20 msg: Feb 13 22:55:28 firewall.home.local config[70202]: config-event: new_config /conf/backup/config-1739483728.2031.xml
2025-02-13T22:55:09     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:55:09     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:54:57     Error   opnsense         /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '4089''(pid:/var/run/unbound.pid)  returned exit code '1', the output was 'kill: 4089: No such process'
2025-02-13T22:54:57     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T22:54:57     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T22:54:57     Notice  configctl        event @ 1739483696.93 exec: system event config_changed response: OK
2025-02-13T22:54:57     Notice  configctl        event @ 1739483696.93 msg: Feb 13 22:54:56 firewall.home.local config[71163]: config-event: new_config /conf/backup/config-1739483696.927.xml
2025-02-13T22:54:56     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_stop (execute task : unbound_service_stop(1))
2025-02-13T22:54:56     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_stop (1)
2025-02-13T22:54:29     Notice  configctl        event @ 1739483669.34 exec: system event config_changed response: OK
2025-02-13T22:54:29     Notice  configctl        event @ 1739483669.34 msg: Feb 13 22:54:29 firewall.home.local config[71500]: config-event: new_config /conf/backup/config-1739483669.343.xml

The exact same lines before the crash we have @23:31:05 ...
2025-02-13T23:31:05     Notice  kernel   ---<<BOOT>>---
2025-02-13T23:31:05     Notice  syslog-ng        syslog-ng starting up; version='4.8.1'
2025-02-13T23:11:10     Error   opnsense         /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '91096''(pid:/var/run/unbound.pid)  returned exit code '1', the output was 'kill: 91096: No such process'
2025-02-13T23:11:10     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
2025-02-13T23:11:10     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
2025-02-13T23:11:09     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_stop (execute task : unbound_service_stop(1))
2025-02-13T23:11:09     Notice  opnsense         /usr/local/sbin/pluginctl: plugins_configure unbound_stop (1)
2025-02-13T23:11:08     Notice  configctl        event @ 1739484668.00 exec: system event config_changed response: OK
2025-02-13T23:11:08     Notice  configctl        event @ 1739484668.00 msg: Feb 13 23:11:08 firewall.home.local config[47402]: config-event: new_config /conf/backup/config-1739484668.0032.xml
2025-02-13T23:10:05     Notice  configctl        event @ 1739484605.24 exec: system event config_changed response: OK
2025-02-13T23:10:05     Notice  configctl        event @ 1739484605.24 msg: Feb 13 23:10:05 firewall.home.local config[47402]: config-event: new_config /conf/backup/config-1739484605.2388.xml

Okay, I was enjoying the fun too early.

Firewall crashed again.


<13>1 2025-02-14T11:32:22+01:00 firewall.home.local dhcp6c 95245 - [meta sequenceId="1"] dhcp6c_script: RENEW on pppoe0 executing
<85>1 2025-02-14T14:30:40+01:00 firewall.home.local sudo 49661 - [meta sequenceId="1"]  clinden : TTY=pts/0 ; PWD=/home/clinden ; USER=root ; COMMAND=/bin/tcsh
<45>1 2025-02-14T16:41:29+01:00 firewall.home.local syslog-ng 12360 - [meta sequenceId="1"] syslog-ng starting up; version='4.8.1'
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="2"] ---<<BOOT>>---
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="3"] Copyright (c) 1992-2023 The FreeBSD Project.
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="4"] Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="5"]    The Regents of the University of California. All rights reserved.
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="6"] FreeBSD is a registered trademark of The FreeBSD Foundation.
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="7"] FreeBSD 14.2-RELEASE-p1 stable/25.1-n269632-cc316253c68 SMP amd64
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="8"] FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67)
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="9"] VT(vga): resolution 640x480
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="10"] CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (2100.00-MHz K8-class CPU)
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="11"]   Origin="GenuineIntel"  Id=0xa0661  Family=0x6  Model=0xa6  Stepping=1
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="12"]   Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="13"]   Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="14"]   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
<13>1 2025-02-14T16:41:29+01:00 firewall.home.local kernel - - [meta sequenceId="15"]   AMD Features2=0x121<LAHF,ABM,Prefetch>

This time, no indication nothing.
root@firewall:/var/log/system # ls -ltr /var/crash
total 1
-rw-r--r--  1 root wheel 5 Feb 10 14:48 minfree
root@firewall:/var/log/system #

root@firewall:/var/log/system # df -hT
Filesystem                 Type       Size    Used   Avail Capacity  Mounted on
zroot/ROOT/default         zfs        106G    1.3G    105G     1%    /
devfs                      devfs      1.0K      0B    1.0K     0%    /dev
/dev/gpt/efiboot0          msdosfs    260M    1.8M    258M     1%    /boot/efi
zroot/tmp                  zfs        105G    1.3M    105G     0%    /tmp
zroot/var/log              zfs        105G    667M    105G     1%    /var/log
zroot                      zfs        105G     96K    105G     0%    /zroot
zroot/var/audit            zfs        105G     96K    105G     0%    /var/audit
zroot/usr/home             zfs        105G     96K    105G     0%    /usr/home
zroot/var/crash            zfs        105G     96K    105G     0%    /var/crash
zroot/var/mail             zfs        105G     96K    105G     0%    /var/mail
zroot/usr/ports            zfs        105G     96K    105G     0%    /usr/ports
zroot/var/tmp              zfs        105G     96K    105G     0%    /var/tmp
zroot/usr/src              zfs        105G     96K    105G     0%    /usr/src
devfs                      devfs      1.0K      0B    1.0K     0%    /var/dhcpd/dev
devfs                      devfs      1.0K      0B    1.0K     0%    /var/unbound/dev
/usr/local/lib/python3.11  nullfs     106G    1.3G    105G     1%    /var/unbound/usr/local/lib/python3.11
/lib                       nullfs     106G    1.3G    105G     1%    /var/unbound/lib
root@firewall:/var/log/system #

So no new core dump, no filesystem at all in any issue, no panic, just restart  - like if someone pushed the reset button (no it is sitting right next to me) or pulled the power (no even not that).

So I really struggle to understand on how to troubleshoot the system now.

Also I am unclear on what changed, and if the update could have anything to do with a firmware requirement or some kind of reset.


And it crashed within minutes again. I switched it back to an older device I have (good to have a backup). But now I am struggeling on how to really run a burn-in test on the firewall hardware (not with opnsense). Any good advise?

February 14, 2025, 05:33:35 PM #6 Last Edit: February 14, 2025, 05:37:31 PM by pfry Reason: Clarification
Quote from: SerErris on February 14, 2025, 05:08:19 PMAnd it crashed within minutes again. I switched it back to an older device I have (good to have a backup). But now I am struggeling on how to really run a burn-in test on the firewall hardware (not with opnsense). Any good advise?

That's easy. PassMark's MemTest86 and GIMPS (mprime/Prime95). I don't know of a (recent) boot image with mprime - I boot System Rescue (which I always have flying around) and run mprime off of a FAT32 stick (which I also always have flying around, for firmware images). You could also boot the FreeBSD 14.2 installer and run mprime under that for a potentially (if not likely) more applicable test. Be aware that mprime writes statistics to (its) disk by default - R/W media is required (I haven't looked into suppressing this behavior). Edit: Pointed out because I always sync the filesystems before shutting down the OS. Don't want to mess up my stick.

Thanks for the tips.

I researched it a little bit and now went to stress-ng. It actually hammers the system pretty much (memory and CPU) and that is great.

So now I let it run for 24 hours and see.

BTW: I have replaced memory that was in the system (Kingston CBD26D4S9S8K1C) and SSD for now. Both were delivered as part of the box, but I bought some compatible GSkill memory from the Protectli supported list some month ago and now put it in, as well as a samsung 980 Pro instead of the noname protectli SSD... I would assume that if I would get disk errors, I would get kernel panics or something to see ... bot nut just a reboot. So I assume the issue anyhow anywhere between CPU and Memory.

What I know already is that the system is pretty cool under OPNsense (35°C on the CPU). So heat is probably not an issue. Lets see if the system survives 24 hours of stress-NG memory hammering and CPU hammering, then I think it is okay. Then lets run OPNSense on it again.

So will report back here.

BTW: Protectli is of no help. This is just a pretty expensive Firewall build that is actually making problems on the hardware end.

Okay I ran the stress-ng for 24 hours which had all the memory and all the swap space in use as well as all 4 cores/8 threads. So the system ran through without any issue.

I am really struggeling to understand why it crashes with OPNsense and not with linux Ubuntu desktop under full load.

Did you install the CPU microcode updates?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

No, I did not. Not actively - maybe as part of the upgrade or part of ubuntu automatically, but nothing I did myself (btw never did on any machine in my life).

I switched now both - memory and SSD. The original SSD from protectly is "flimsy" at least and the RAM is Kingston (from support list).

I now have a trusted (reused) Samsung 980 Pro 1TB nvme SSD in place as well as GSkill 32GB RAM. A little bit of overkill, however it is now quiet ... so press keep fingers crossed, that this is now stable.

Quote from: SerErris on February 16, 2025, 05:49:28 PMNo, I did not. Not actively - maybe as part of the upgrade or part of ubuntu automatically, but nothing I did myself (btw never did on any machine in my life).

These updates are not persistent but installed to the CPU at every reboot. I recommend installing the matching plugin (Intel or AMD) depending on your CPU. That and one reboot is all that is necessary.

HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Install this from CLI or find it in the GUI, then reboot.

pkg install os-cpu-microcode-intel-1.1

Okay, thanks will check this. However if we would run into a CPU issue I would expect a kernal panic message at least. Also I do not really understand what the difference is if it is installed on the fly or persistent.