Fatal trap 12 and Fatal trap 9 since some weeks

Started by brot1337, December 19, 2024, 10:04:18 AM

Previous topic - Next topic
Hey guys,

my firewall crashes daily with one of the above trap error messages.

Unfortunately I have already submitted the latest crash report via bug tracker therefore I can't post any details here.

I've already seen the other Thread https://forum.opnsense.org/index.php?topic=44695.0 where it is suggested to install a debug kernel.

Although I'm using a newer version (24.7.11) and couldn't find a corresponding debug kernel.

root@opnsense:~ # opnsense-update -zkr dbg-24.7.11
Fetching kernel-dbg-24.7.11-amd64.txz: ..[fetch: https://mirror.dns-root.de/opnsense/FreeBSD:14:amd64/snapshots/sets/kernel-dbg-24.7.11-amd64.txz.sig: Not Found] failed, no signature found

Is there a newer version available? Or is this kernel compatible with 24.7.11?

Thank you very much!

If your stack trace looks the same, it might be better to join that existing discussion, rather than fragment into another one about the same topic.

Regarding the debug kernel - you could try the exact one referenced in that other discussion, if debug data is still needed by the developers. The debug kernel is not going to fix the problem - only potentially help the developers find it.

There is no 24.7.11 kernel...

# opnsense-update -zkr 24.7.10_2 is the hotfixed debug kernel that belongs to 24.7.10.



Cheers,
Franco

Quote from: franco on December 19, 2024, 10:54:06 AMThere is no 24.7.11 kernel...

# opnsense-update -zkr 24.7.10_2 is the hotfixed debug kernel that belongs to 24.7.10.



Cheers,
Franco

But does this kernel also works with opnsense 24.7.11 ?

Quote from: dseven on December 19, 2024, 10:22:40 AMIf your stack trace looks the same, it might be better to join that existing discussion, rather than fragment into another one about the same topic.

I also get Fatal trap 12: page fault while in kernel mode errors, but the rest of the trace seems to be different.

Quote from: dseven on December 19, 2024, 10:22:40 AMThe debug kernel is not going to fix the problem - only potentially help the developers find it.

I know, but it helps with troubleshooting :)


Cheers


Quote from: brot1337 on December 19, 2024, 11:18:32 AM
Quote from: franco on December 19, 2024, 10:54:06 AMThere is no 24.7.11 kernel...

# opnsense-update -zkr 24.7.10_2 is the hotfixed debug kernel that belongs to 24.7.10.



Cheers,
Franco

But does this kernel also works with opnsense 24.7.11 ?

If you to to System -> Firmware -> Packages and search for "kernel", you should see version 24.7.10, even though OPNsense is on 24.7.11... so yes, it should work, I believe...

> But does this kernel also works with opnsense 24.7.11 ?

It is the kernel of 24.7.11.

Thank you very much! :)

I've just noticed the following error message in the log of the backend component. Is this something related?

[117b382c-2e02-4b43-bec3-a92cbc20f5cb] Script action failed with Command '/usr/local/opnsense/scripts/system/sysctl.py --gather' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 78, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/system/sysctl.py --gather' returned non-zero exit status 1.

When I execute the sysctl command in a shell I get the following error:

root@opnsense:~ # /usr/local/opnsense/scripts/system/sysctl.py --gather
Traceback (most recent call last):
  File "/usr/local/opnsense/scripts/system/sysctl.py", line 65, in <module>
    sp = subprocess.run(['/sbin/sysctl', '-a'], capture_output=True, text=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/subprocess.py", line 550, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/subprocess.py", line 1209, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/subprocess.py", line 2153, in _communicate
    stdout = self._translate_newlines(stdout,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/subprocess.py", line 1086, in _translate_newlines
    data = data.decode(encoding, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 63443: invalid start byte


I've already searched for the Unicode-Decode error, but I couldn't find any related issue at github.

Is it perhaps due to my sysctl parameters?

Do you have one of those Intel hardware things that really really need microcode updates in order to work properly?


Cheers,
Franco

December 19, 2024, 04:40:51 PM #8 Last Edit: December 19, 2024, 04:44:19 PM by brot1337
Yeah, I'm using that one: https://www.amazon.de/gp/product/B0CJ5L7SH6/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1



root@opnsense:~ #  sysctl hw.model hw.machine hw.ncpu
hw.model: Intel(R) N100
hw.machine: amd64
hw.ncpu: 4


   
vendor     = 'Intel Corporation'
device     = 'Ethernet Controller I226-V'

I've not edited or enabled anything for microcode updates so far. So I should probably install the corresponding plugin? https://docs.opnsense.org/manual/cpu-microcode.html

Yes, please install os-microcode-intel and reboot. It should be fine after.

Thank you very much for your efforts, Ive just installed the latest microcode update. I will keep this thread updated

If you get fatal trap 12 during PPPoE reconnects, you are most likely another victim of a kernel bug which was fixed upstream but has not been implemented in OPNsense so far. Take a look at one of these threads.

What is the actual commit in question for this guerrilla cross-post?

Also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276294 seems to be open still.

25.1-BETA is out... just see if FreeBSD 14.2 fixed this.  I'm not so sure.


Cheers,
Franco

So I checked and found:

https://reviews.freebsd.org/D42635 not in FreeBSD 14 as per FreeBSD decision
https://reviews.freebsd.org/D42636 already in 14.1 / 24.7
https://reviews.freebsd.org/D42637 already in 14.1 / 24.7

I'm very willing to cherry-pick fixes that are missing but we need to talk commits not feelings.


Cheers,
Franco

I was merely pointing out connections to an annoying bug which is open since years and 272319 gave the impression that it was finally fixed upstream.
Kind reminder: this goes way back to 2019 and is happening almost daily for me.