Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - bikemike

#1
I am having the same issue with random reboots.  There are many, many threads on this in the forums with no real solution dating back to previous versions up to current.  I posted this thread, but not a lot of movement:

https://forum.opnsense.org/index.php?topic=33583.msg162367#msg162367

Was going to try to update the BIOS on my APU1D4, but not hopeful since others in this thread have not seen improvement afterwards.  Wish the OPNsense developers would put some attention on this.  I did however reach the four day mark until this morning when things crashed (twice).  Looks very similar to the output in the YT video someone posted in a previous comment but not related update checks.  Coming from pfSense where I had literally zero stability issues to random reboots nearly every other day is super frustrating.
#2
Thanks for the reply newsense!  I am hoping the firmware update will take care of the issue (and new power supply).  I was actually going to use the following for the update:

https://github.com/pcengines/apu2-documentation/blob/master/scripts/apu_fw_updater_opnsense.sh

Looks like the flashrom command is a bit different than what you provided though.
#3
I have been running OPNsense for several weeks now after coming from pfSense.  The platform is great and I am really happy about making the switch.  However, I cannot keep OPNsense up for more than a day or two without crashing. 

Hardware: PC engines apu1d4 running BIOS Build 9/8/2014 (beta, reduced "spew level")
Processor: AMD G-T40E Processor
NIC: Realtek RTL8111E
Drive: Transcend 32GB SATA III 6Gb/s MSA370S mSATA
OPNsense Version: OPNsense 23.1.5_4-amd64

Few things I have tried:

  • hw.ibrs_disable = 1 (Spectre V2 mitigation)
  • vm.pmap.pti = 0 (Meltdown mitigation)
  • Install os-realtek-re plugin
  • Enable AMD thermal sensor (not related, but allows me to see CPU temps which are good)

I tried enabling powerd, but this processor under the current firmware does not support this.  When the crash does occur, I get the following:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0xffffffff80cc3160
stack pointer         = 0x28:0xfffffe000798ab60
frame pointer         = 0x28:0xfffffe000798abc0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 11 (idle: cpu0)
trap number = 9
panic: general protection fault
cpuid = 0
time = 1681634405
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000798a980
vpanic() at vpanic+0x17f/frame 0xfffffe000798a9d0
panic() at panic+0x43/frame 0xfffffe000798aa30
trap_fatal() at trap_fatal+0x385/frame 0xfffffe000798aa90
calltrap() at calltrap+0x8/frame 0xfffffe000798aa90
--- trap 0x9, rip = 0xffffffff80cc3160, rsp = 0xfffffe000798ab60, rbp = 0xfffffe000798abc0 ---
callout_process() at callout_process+0x180/frame 0xfffffe000798abc0
handleevents() at handleevents+0x188/frame 0xfffffe000798ac00
timercb() at timercb+0x24e/frame 0xfffffe000798ac50
hpet_intr_single() at hpet_intr_single+0x1b3/frame 0xfffffe000798ac80
intr_event_handle() at intr_event_handle+0x92/frame 0xfffffe000798acd0
intr_execute_handlers() at intr_execute_handlers+0x4b/frame 0xfffffe000798ad00
Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe000798ad00
--- interrupt, rip = 0xffffffff8111b0a6, rsp = 0xfffffe000798add0, rbp = 0xfffffe000798add0 ---
acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe000798add0
acpi_cpu_idle() at acpi_cpu_idle+0x2ef/frame 0xfffffe000798ae10
cpu_idle_acpi() at cpu_idle_acpi+0x3e/frame 0xfffffe000798ae30
cpu_idle() at cpu_idle+0x9f/frame 0xfffffe000798ae50
sched_idletd() at sched_idletd+0x4e1/frame 0xfffffe000798aef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000798af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000798af30
--- trap 0x7a140b8, rip = 0xffffffff80c30e8f, rsp = 0, rbp = 0xffffffff8133a258 ---
mi_startup() at mi_startup+0xdf/frame 0xffffffff8133a258
KDB: enter: panic
panic.txt0600003014416732145  7140 ustarrootwheelgeneral protection faultversion.txt0600007414416732145  7543 ustarrootwheelFreeBSD 13.1-RELEASE-p7 stable/23.1-n250411-85724e9ce22 SMP


Load does not seem to play a factor.  In many cases, its the middle of the night when the crash occurs.  I have made many crash reports, but not sure where to go next.  I am considering updating the APU firmware/BIOS to v4.17.0.3 next.  One crash apparently corrupted something and required a fresh install as OPNsense would not fully boot.  Obviously, this is not sustainable.

In the many, many years of running pfSense, I never once had a crash.  So, not sure why OPNsense is having issues.  Obviously, two different systems, but this kinda sucks.  Any help or insight would be greatly appreciated.

I should note, it seems many others are seeing this as well:

https://forum.opnsense.org/index.php?topic=28302.0
Set net.inet.tcp.sack.enable to 0, but this was supposed to be fixed in 22.7.5.

https://forum.opnsense.org/index.php?topic=27211.0
No resolution...

https://forum.opnsense.org/index.php?topic=31965.0
I am actually going to try a new power supply since I had a similar issue before.

https://forum.opnsense.org/index.php?topic=33239.0

https://forum.opnsense.org/index.php?topic=20599

[Mega thread on the issue but dated and maybe not relevant]
https://forum.opnsense.org/index.php?topic=11419.0

[4/22 Update]  Replaced the power supply two days ago.  System appeared stable and went nearly two days, then another crash.  Likely going to move forward with the BIOS/firmware update next.

[4/23 Update] Today everything came unglued and started core dumping.  The web interface was returning a 500, but traffic was still flowing.  Ended up having to pull the power on the OPNsense device to recover...  See attached screenshot for details.

[4/28 Update] I removed the WireGuard plugin a few days ago and OPNsense has been up since.  Pushing nearly four days now.  I had installed the plugin but never fully configured or brought up the interface.  Wondering if that was causing issues.  If several more days go by, it could be suspect.  I did have a PHP component crash which was preventing graphs on the Dashboard from loading.  Restarted all the services which brought that back.

[5/9 Update] System went four days without a crash.  Updated my other APU board to the latest BIOS, but need to swap out and put in use.  Keep submitting the crash reports :-(

Switched to my other APU1D4 board with the latest BIOS.  Lets see how this goes...

[5/11 Update]  So far the old board with the new BIOS is holding strong.  Needs to exceed 4+ days for me to feel comfortable things are stable.  Starting to wonder if maybe the other/new board has memory issues.  Was going to run it through an extensive memory check to rule that out.

[5/16 Update] System has been up over six days now which is a record.  Thinking were stable at this point.  It was either the new board with old BIOS which was the issue or something else on that new board.  The old board with new BIOS is good.  I am tempted to upgrade the new board to the new BIOS, then put it back into use and see what happens.  If that holds stable, it was definitely the old BIOS.  Otherwise, I think the issue is resolved at this point. 

[5/20 Update] System has been up for over 16 days now.  Going to consider the system stable and the issue resolved.