Odroid H2+ Kernelpanic (pagefault)

Started by MartB, November 11, 2021, 09:26:41 PM

Previous topic - Next topic
November 11, 2021, 09:26:41 PM Last Edit: November 13, 2021, 02:40:07 AM by MartB
Hey there,

the test-upgrade worked fine but as soon as the network interfaces get some traffic the system crashes.
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x4
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80e74823
stack pointer         = 0x0:0xfffffe00c6179cb0
frame pointer         = 0x0:0xfffffe00c6179d60
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (swi1: netisr 0)
trap number = 12
panic: page fault
cpuid = 0
time = 1636661506
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c6179950
vpanic() at vpanic+0x187/frame 0xfffffe00c61799b0
panic() at panic+0x43/frame 0xfffffe00c6179a10
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00c6179a70
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c6179ad0
trap() at trap+0x26a/frame 0xfffffe00c6179be0
calltrap() at calltrap+0x8/frame 0xfffffe00c6179be0
--- trap 0xc, rip = 0xffffffff80e74823, rsp = 0xfffffe00c6179cb0, rbp = 0xfffffe00c6179d60 ---
ip_tryforward() at ip_tryforward+0x213/frame 0xfffffe00c6179d60
ip_input() at ip_input+0x382/frame 0xfffffe00c6179df0
swi_net() at swi_net+0x12b/frame 0xfffffe00c6179e60
ithread_loop() at ithread_loop+0x25a/frame 0xfffffe00c6179ef0
fork_exit() at fork_exit+0x8a/frame 0xfffffe00c6179f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c6179f30
--- trap 0x80386000, rip = 0xffffffff80c4944f, rsp = 0, rbp = 0x2475000 ---
mi_startup() at mi_startup+0xdf/frame 0x2475000
KDB: enter: panic


Was the realtek-vendor-kmod if_re.ko compiled with the correct kernel sources or is there any incompatible changes?

If i use any other usb network interface it works just fine.
Nevermind it just happened with an USB network interface too, something else must be broken here.

I first though its related to me having RSS enabled but i disabled the sysctl in the config.xml and it still happens.

The textdump.tar file is attached to this post.

HWPROBE for 22.1 on odroid-h2+: https://bsd-hardware.info/?probe=9268d91d01

Was reported on IRC yesterday too. It's part of the shared forwarding changes (ip_tryforward) and you can disable shared forwarding to avoid the panic for now.

I'll be looking at this closer today.


Cheers,
Franco

This should be it: https://github.com/opnsense/src/commit/730eb40ce9

You can try the snapshot kernel like so:

# opnsense-update -zkr 22.1.b1_57

Isn't the Odroid using the 2.5G Realteks? Are you using the os-realtek-re plugin for that?

Thanks for testing the beta!


Cheers,
Franco

Quote from: franco on November 12, 2021, 12:41:42 PM
This should be it: https://github.com/opnsense/src/commit/730eb40ce9

You can try the snapshot kernel like so:

# opnsense-update -zkr 22.1.b1_57

Isn't the Odroid using the 2.5G Realteks? Are you using the os-realtek-re plugin for that?

Thanks for testing the beta!


Cheers,
Franco

Thanks for looking into this.
Imma try to update later tonight and see if its fixed.

Yeah i have to use the vendor kmod from the package! (RTL8125B)
The interfaces worked fine from the start but the shared forwarding panic caused it to somehow reboot into a "plain" freebsd (changed boot logo) right after. This did not have the kmod loaded which killed the interface assignments from the config. So thats probably sth that should be noted. Could have been a fluke but maybe some recovery logic needs to be adjusted.

Thanks again!

November 13, 2021, 02:01:37 AM #4 Last Edit: November 13, 2021, 02:25:00 AM by MartB
@Franco
This seems to work now after updating the kernel and enabling the shared forwarding again, great job!

Though: there is an new kernel panic with the fq_pie module if traffic shaping is configured to use it

db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100040 td 0xfffffe00205e6ac0
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00c61b4a20
vpanic() at vpanic+0x1b8/frame 0xfffffe00c61b4a80
panic() at panic+0x43/frame 0xfffffe00c61b4ae0
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00c61b4b40
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c61b4ba0
trap() at trap+0x26a/frame 0xfffffe00c61b4cb0
calltrap() at calltrap+0x8/frame 0xfffffe00c61b4cb0
--- trap 0xc, rip = 0xffffffff827a5b59, rsp = 0xfffffe00c61b4d80, rbp = 0xfffffe00c61b4da0 ---
fqpie_callout_cleanup() at fqpie_callout_cleanup+0x59/frame 0xfffffe00c61b4da0
softclock_call_cc() at softclock_call_cc+0x155/frame 0xfffffe00c61b4e40
softclock() at softclock+0x79/frame 0xfffffe00c61b4e60
ithread_loop() at ithread_loop+0x25a/frame 0xfffffe00c61b4ef0
fork_exit() at fork_exit+0x8a/frame 0xfffffe00c61b4f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c61b4f30
--- trap 0x82557fe0, rip = 0xffffffff80c4944f, rsp = 0, rbp = 0x2554000 ---
mi_startup() at mi_startup+0xdf/frame 0x2554000
db:0:kdb.enter.default>  ps


If you need the textdump hit me up.

Traffic shaper settings:

I tried to reproduce this today but was unable to do so. Are there any other prerequisites to this or something like forcing an interface apply or changing rules? I tried forcing a reboot in the middle of a download but maybe also my traffic wasn't as diverse as it should be to trigger this in fqpie_callout_cleanup().

The only relevant change here I see is https://github.com/opnsense/src/commit/c011422b2d77cc290

Happy for any pointers.


Cheers,
Franco

I cant revert to fq_pie atm, as i need this to run somewhat stable.
But i will give this a shot next week and see if i can get any more info out of it.

Other than that, have you run into other issues? Odroid H2+ here as well, but not doing any shaping..

Everything fine so far, did not get to test pie again due to xmas movies 😂

Next year maybe!

I can reproduce this now... yay...  BETA3 is already wrapped up but maybe we can replace the kernel at the end of the week with a fix for this. Still not sure what causes it.


Cheers,
Franco


To me it looks like this does the trick https://github.com/opnsense/src/commit/85b720c1ce

Since I have to amend more in BETA3 I'll make sure it'll be in there.


Cheers,
Franco

Pie works on the latest version now, great job Franco!

Nice, thanks. There is some discussion in FreeBSD on how to fix this properly but for now this seems like a safe bet indeed.


Cheers,
Franco

Has anyone tested the new driver version 197.00?

I am trying to get it to work on aarch64 NanoPi-R5S.
I have three Ethernet ports, two of them RTL8125BG.

Does anyone have experience testing exactly RTL8125BG?
I see a link up at 2.5GbE, but no packets get through.