OPNsense reboots multiple times per day

jones · August 20, 2024, 12:59:49 AM

For some reason, my OPNsense router reboots multiple times per day, and I can't understand why. Looking into the "General" logs, I don't see anything special, except that it does reboot indeed (see `---<<BOOT>>---`) below:

```
2024-08-20T00:48:08 Notice kernel FreeBSD clang version 18.1.5 (https://github.com/llvm/llvm-project.git llvmorg-18.1.5-0-g617a15a9eac9)
2024-08-20T00:48:08 Notice kernel FreeBSD 14.1-RELEASE-p3 stable/24.7-n267778-bb2c86773c1b SMP amd64
2024-08-20T00:48:08 Notice kernel FreeBSD is a registered trademark of The FreeBSD Foundation.
2024-08-20T00:48:08 Notice kernel The Regents of the University of California. All rights reserved.
2024-08-20T00:48:08 Notice kernel Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
2024-08-20T00:48:08 Notice kernel Copyright (c) 1992-2023 The FreeBSD Project.
2024-08-20T00:48:08 Notice kernel ---<<BOOT>>---
2024-08-20T00:48:08 Notice syslog-ng syslog-ng starting up; version='4.8.0'
2024-08-20T00:38:49 Warning radvd prefix length should be 64 for igb1
2024-08-20T00:38:33 Warning radvd prefix length should be 64 for igb1
2024-08-20T00:38:22 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt1(igb2)
2024-08-20T00:38:17 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-08-20T00:38:17 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : dnsmasq_configure_do())
2024-08-20T00:38:17 Warning radvd prefix length should be 64 for igb1
2024-08-20T00:38:17 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns ()
2024-08-20T00:38:17 Warning radvd prefix length should be 64 for igb1
2024-08-20T00:38:16 Error opnsense /usr/local/etc/rc.linkup: The command '/usr/sbin/daemon -f -p '/var/run/dhcpleases6.pid' '/usr/local/opnsense/scripts/dhcp/prefixes.sh'' returned exit code '3', the output was 'daemon: process already running, pid: 9786'
2024-08-20T00:38:15 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
2024-08-20T00:38:15 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp ()
2024-08-20T00:38:15 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (execute task : ipsec_configure_do(,opt1))
2024-08-20T00:38:15 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (,opt1)
2024-08-20T00:38:15 Notice opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt1'
2024-08-20T00:38:14 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt1(igb2)
2024-08-20T00:38:13 Notice kernel <6>igb2: link state changed to DOWN
2024-08-20T00:38:13 Notice kernel <6>igb2: link state changed to UP
2024-08-20T00:36:11 Notice dhcp6c dhcp6c_script: RENEW on igb0_vlan10 executing
2024-08-20T00:35:49 Warning radvd prefix length should be 64 for igb1
2024-08-20T00:30:30 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-20T00:30:30 Notice dhclient dhclient-script: Reason RENEW on igb0_vlan10 executing
2024-08-20T00:28:17 Warning radvd prefix length should be 64 for igb1
2024-08-20T00:19:54 Warning radvd prefix length should be 64 for igb1
2024-08-20T00:12:11 Notice dhcp6c dhcp6c_script: RENEW on igb0_vlan10 executing
```

In the attached system(1).log, it can be seen that it happens regularly.

Why is that?

sammycda · August 20, 2024, 04:47:26 AM

No idea if this is even remotely related but I was having the same issue and it turns out I had bad ram in the appliance. Replaced it and am living happily ever after.

Good luck diagnosing and fixing your problem.

meyergru · August 20, 2024, 10:50:38 AM

As can be gleaned from the log, your device seems to be a PCENGINES APU.2E2 with only 2 GByte of soldered DDR3 RAM, so I would think this is pre-tested RAM.

According to OpnSense hardware requirements, 2 GByte is at the bare minimum for systems that "run all OPNsense standard features that do not need disk writes".

When I tried an OpnSense VM and gave it only 2 GByte of memory, I experienced hangups, which in your case, may or may not trigger a watchdog. I think, depending on what settings you use, this size of RAM might be on the edge or even insufficient and maybe the situation has gotten worse with the advent of FreeBSD 14.1 kernel in OpnSense.

jones · August 20, 2024, 11:57:46 AM

Thanks for the answers!

I actually have an APU.6B4 [1] with 4 GB DDR3-1333 DRAM.
Shouldn't that be enough?

Also is there a way to run diagnostics with OPNsense, in case my RAM is faulty?

[1]: https://www.pcengines.ch/apu6b4.htm

franco · August 20, 2024, 12:03:02 PM

Without a swap partition to capture kernel crash reports it's impossible to say what the issue could be.

Cheers,
Franco

jones · August 20, 2024, 12:13:00 PM

> Without a swap partition to capture kernel crash reports it's impossible to say what the issue could be.

Oh, let me enable the swap partition!

Will the crash reports then appear in the same log file?

EDIT: I did enable the swap partition in System > Settings > Miscellaneous and rebooted the router, and found a widget on the dashboard that shows it. Somehow it shows 10GB, which seems off (in "Miscellaneous" it was mentioned 2GB).

EDIT 2: Actually, `$ top` says that there is now 10GB of swap as well:

```
CPU: 5.6% user, 0.0% nice, 2.4% system, 0.0% interrupt, 92.0% idle
Mem: 148M Active, 141M Inact, 547M Wired, 56K Buf, 3071M Free
ARC: 219M Total, 70M MFU, 109M MRU, 3288K Anon, 1915K Header, 35M Other
134M Compressed, 360M Uncompressed, 2.68:1 Ratio
Swap: 10G Total, 10G Free
```

I guess now I just have to wait for the next crash and find a way to extract the kernel crash reports!

franco · August 20, 2024, 12:59:37 PM

Ok swap partition is there then, but swap file don't work for dumps.

Nothing in the crash reporter? If not it's a power issue and the box just turns off/reboots.

Cheers,
Franco

jones · August 20, 2024, 01:10:41 PM

> Ok swap partition is there then

Maybe I was not clear. I just enabled it, and now it seems to be there! So I need to wait for a new reboot (which should happen today anyway) :-).

> but swap file don't work for dumps.

And the system.log I attached above is a "dump", so it wouldn't be attached there? Is that what you mean?

> Nothing in the crash reporter?

I'll go read about the "crash reporter" so that I find it when it reboots again. I'll keep you posted!

franco · August 20, 2024, 01:13:51 PM

Sorry to be clear:

swap partition -> can do crash dumps
swap file option -> cannot do crash dumps

But I think the swap file is 2 gb so the partition should be there at 8 gb too. Fingers crossed. :)

Cheers,
Franco

jones · August 20, 2024, 01:18:43 PM

Oh I see! So I probably already had a swap partition, and I did not need to enable the swap file. Still learning, sorry about that :-).

So there was something in the Crash Reporter, but I could not easily download it as a file so I submitted it. And now it disappeared! I thought I would get a copy on my email address, but I did not.

So I guess next time there is a crash I will go and manually copy-paste all the boxes of the crash dump, or something like that ::)

franco · August 20, 2024, 01:22:49 PM

If you submitted it please PM the email you used to submit. I can't find it at first glance.

Cheers,
Franco

jones · August 20, 2024, 02:24:00 PM

> If you submitted it please PM the email you used to submit. I can't find it at first glance.

I did, thanks!

In the meantime, it just rebooted again, but this time there is nothing in the crash reporter. The CPU temperature has been stable the whole morning at around 60 degrees Celcius, the RAM is stable around 15% of usage, the disk at 7%.

A power issue has been mentioned multiple times... should I try to buy a new power supply and see if that's better? Or is there a better way to check that?

franco · August 20, 2024, 02:26:45 PM

Observing the crash in a console first might help, but it sure sounds like it's just restarting.

Cheers,
Franco

jones · August 20, 2024, 02:31:23 PM

> Observing the crash in a console first might help

So connecting over serial to the router, staying logged in without doing anything in particular, and hoping that a message appears before it reboots?

Or did you mean something more elaborate? I have an easy serial access so I can do that :D

franco · August 20, 2024, 02:31:48 PM

Ok, found them:

Code Select

Tracing pid 4 tid 100060 td 0xfffffe0008d8f3a0
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0062c9eb90
vpanic() at vpanic+0x182/frame 0xfffffe0062c9ebe0
panic() at panic+0x43/frame 0xfffffe0062c9ec40
trap_fatal() at trap_fatal+0x387/frame 0xfffffe0062c9eca0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0062c9ed00
calltrap() at calltrap+0x8/frame 0xfffffe0062c9ed00
--- trap 0xc, rip = 0xffffffff80cf2170, rsp = 0xfffffe0062c9edd0, rbp = 0xfffffe0062c9ede0 ---
turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0062c9ede0
__mtx_unlock_sleep() at __mtx_unlock_sleep+0x7a/frame 0xfffffe0062c9ee10
unlock_mtx() at unlock_mtx+0x30/frame 0xfffffe0062c9ee20
_sleep() at _sleep+0x141/frame 0xfffffe0062c9eea0
xpt_done_td() at xpt_done_td+0x8b/frame 0xfffffe0062c9eef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0062c9ef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0062c9ef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Code Select

Tracing pid 51170 tid 103454 td 0xfffffe00b16cd740
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00afda7680
vpanic() at vpanic+0x182/frame 0xfffffe00afda76d0
panic() at panic+0x43/frame 0xfffffe00afda7730
pmap_remove_pages() at pmap_remove_pages+0x944/frame 0xfffffe00afda7880
exec_new_vmspace() at exec_new_vmspace+0x207/frame 0xfffffe00afda78e0
exec_elf64_imgact() at exec_elf64_imgact+0xb1c/frame 0xfffffe00afda79e0
kern_execve() at kern_execve+0x7dd/frame 0xfffffe00afda7d70
sys_execve() at sys_execve+0x5a/frame 0xfffffe00afda7e00
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00afda7f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00afda7f30
--- syscall (59, FreeBSD ELF64, execve), rip = 0x335ff689626a, rsp = 0x335ff3ef3508, rbp = 0x335ff3ef3650 ---

Code Select

Tracing pid 4 tid 100060 td 0xfffffe0008d8f3a0
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0062c9ebb0
vpanic() at vpanic+0x182/frame 0xfffffe0062c9ec00
panic() at panic+0x43/frame 0xfffffe0062c9ec60
trap_fatal() at trap_fatal+0x387/frame 0xfffffe0062c9ecc0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0062c9ed20
calltrap() at calltrap+0x8/frame 0xfffffe0062c9ed20
--- trap 0xc, rip = 0xffffffff80ce834e, rsp = 0xfffffe0062c9edf0, rbp = 0xfffffe0062c9ee20 ---
sleepq_add() at sleepq_add+0x13e/frame 0xfffffe0062c9ee20
_sleep() at _sleep+0x15d/frame 0xfffffe0062c9eea0
xpt_done_td() at xpt_done_td+0x8b/frame 0xfffffe0062c9eef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0062c9ef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0062c9ef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Looks like memory corruptions to me (all are different and very generic).

Cheers,
Franco

OPNsense reboots multiple times per day

jones

August 20, 2024, 12:59:49 AM

sammycda

August 20, 2024, 04:47:26 AM #1

meyergru

August 20, 2024, 10:50:38 AM #2

jones

August 20, 2024, 11:57:46 AM #3

franco

August 20, 2024, 12:03:02 PM #4

jones

August 20, 2024, 12:13:00 PM #5 Last Edit: August 20, 2024, 12:50:25 PM by jones

franco

August 20, 2024, 12:59:37 PM #6

jones

August 20, 2024, 01:10:41 PM #7

franco

August 20, 2024, 01:13:51 PM #8

jones

August 20, 2024, 01:18:43 PM #9

franco

August 20, 2024, 01:22:49 PM #10

jones

August 20, 2024, 02:24:00 PM #11

franco

August 20, 2024, 02:26:45 PM #12

jones

August 20, 2024, 02:31:23 PM #13

franco

August 20, 2024, 02:31:48 PM #14