Crashes when reconnecting PPPoE repeatedly

Started by craig, October 23, 2023, 12:00:33 PM

Previous topic - Next topic
I was a bit afraid of that. Building with INVARIANTS in a release still crashes it pretty reliably in unrelated places. I'm not even sure I can do the debug thing without it due to other build requirements.


Cheers,
Franco

I have just had a PPPoE crash (typical), and do have a 1.96GB vmcore.0 crash file from the "production kernel" if it would help?

Yes please. Do you have somewhere to stash it?


Cheers,
Franco

PS: Compressing it should help with size a lot.

I've popped it on WeTransfer - https://we.tl/t-QYw1eSa4pj let me know if there's any problems.

Got it, thanks. Taking a look right away.


Cheers,
Franco

Just to also give a quick update: I had no crash on reconnect for the last three days and I don't want to provoke one so as not to change the conditions leading to the crash.
As said, sometimes the crashes happen for several days in a row and sometimes nothing happens for a week.  :o

Unfortunately I'm running into this gdb issue:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257036

I've checked all the gdb version we had down to 22.1 and all exhibit the same behaviour which either means the core file or the debug kernel file has a persistent issue.. it could be the size of the core file but that file size itself I wouldn't call problematic at first glance. :(


Cheers,
Franco

Is there an info.0 file still on your end? I might need that, but not sure.

I can't get useful information out of the core, e.g.:

# dmesg -M vmcore.0
dmesg: _amd64_minidump_vatop: virtual address 0x0 not minidumped
dmesg: kvm_read: invalid address (0x0)

# ps -M vmcore.0
ps: invalid address (0xffffffff82d10000)

etc.


Cheers,
Franco

I do - I backed up the entire folder

Dump header from device: /dev/gpt/swapfs
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 1956237312
  Blocksize: 512
  Compression: none
  Dumptime: 2023-10-31 10:41:57 +0000
  Hostname: OPNsense.home
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 13.2-RELEASE-p3 stable/23.7-n254818-f155405f505 SMP
  Panic String: page fault
  Dump Parity: 2194897932
  Bounds: 0
  Dump Status: good

And we have a winner.  ;)
After 6 days it finally crashed again.

@franco: I sent a PM regarding the dump files.

This morning it crashed again (was still on 23.7.7_3) with this well known error:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address = 0x10
fault code = supervisor read data, page not present


I've already submitted the full crash log.

Today I had another crash with the same error.  >:(

@franco: any insights on the debug logs yet?

The past few days had daily crashes and reboots BTW.

This is also still happening for me - I've been working through disabling functionality (shaper, jumbo frames etc) to try and figure it out, but it's a slow process.

It does look like IPv6 is going to be my next target though, as `ip6_tryforward()` is mentioned in the trace.

Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address = 0x10
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80ea3764
stack pointer         = 0x28:0xfffffe00e013eca0

frame pointer         = 0x28:0xfffffe00e013ed10

Fatal trap 12: page fault while in kernel mode
cpuid = 5; code segment = base 0x0, limit 0xfffff, type 0x1b
apic id = 05
fault virtual address = 0x10
fault code = supervisor read data, page not present
= DPL 0, pres 1, long 1, def32 0, gran 1
instruction pointer = 0x20:0xffffffff80ea3764
processor eflags = interrupt enabled, resume, stack pointer         = 0x28:0xfffffe00e0143ca0
IOPL = 0
current process = 12 (swi1: netisr 6)
trap number = 12
frame pointer         = 0x28:0xfffffe00e0143d10
code segment = base 0x0, limit 0xfffff, type 0x1b
panic: page fault
cpuid = 6
time = 1700089902
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e013ea60
vpanic() at vpanic+0x151/frame 0xfffffe00e013eab0
panic() at panic+0x43/frame 0xfffffe00e013eb10
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00e013eb70
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00e013ebd0
calltrap() at calltrap+0x8/frame 0xfffffe00e013ebd0
--- trap 0xc, rip = 0xffffffff80ea3764, rsp = 0xfffffe00e013eca0, rbp = 0xfffffe00e013ed10 ---
ip6_tryforward() at ip6_tryforward+0x274/frame 0xfffffe00e013ed10
ip6_input() at ip6_input+0x5e4/frame 0xfffffe00e013edf0
swi_net() at swi_net+0x12b/frame 0xfffffe00e013ee60
ithread_loop() at ithread_loop+0x25a/frame 0xfffffe00e013eef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00e013ef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e013ef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic