OPNsense Forum

Archive => 23.7 Legacy Series => Topic started by: craig on October 23, 2023, 12:00:33 PM

Title: Crashes when reconnecting PPPoE repeatedly
Post by: craig on October 23, 2023, 12:00:33 PM
Sometimes I need to repeatedly reconnect my PPPoE connection as my ISP doesn't properly weight their gateways and I end up on one the other side of the country.

Recently (I think since 23.7), when I do this, after a few times OPNsense completely locks up and restarts, I've submitted a few crash reports but wanted to check if anyone else here is able to reproduce?

I have the crash log which I can upload if it'd help anyone (and is there anything other than IPs to remove from the logs?)
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on October 24, 2023, 08:19:55 PM
Welcome to the club.  :(
I stayed at 22.7 until recently exactly because of this dreaded crash and reboot on PPPoE reconnect bug. Was running OPNsense for several years without any problem ever. Right after I dared to finally upgrade to 23.7.1 because problem reports ceased to show up, I was promptly hit by this bug I managed so long to avoid.
Reading past reports, I understand that the developers have a hard time fixing this as none of them has an ISP with PPPoE.
Like you, I've sent lots of bug reports lately.
The weird thing is that sometimes my daily PPPoE reconnect by cron is successful for several days in a row while out of the blue it crashes and reboots for the next few days. Naturally, nothing changed in my environment meanwhile.
I found a FreeBSD kernel bug (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272319) which seems to be the culprit. Not sure though, if OPNsense can do something about it regarding the MPD5 daemon or if we have to patiently wait until this gets fixed upstream.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: Monviech (Cedrik) on October 24, 2023, 08:46:30 PM
To test PPPoE you don't really need an ISP with PPPoE. You can create a fully functional PPPoE Server on linux or freebsd and configure it like an ISP one. Then you can connect your PPPoE client to it on the Opnsense and do all kinds of tests. I already did that to prove something different though.

https://github.com/opnsense/core/issues/6650#issuecomment-1635663267

Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on October 24, 2023, 10:12:14 PM
@Monviech: You are right. I was merely quoting @franco's statement (https://forum.opnsense.org/index.php?topic=12828.msg60201#msg60201).  ::)
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: Patrick M. Hausen on October 26, 2023, 11:01:07 AM
@thatso if I read that FreeBSD issue correctly it should only concern users of FreeBSD 14?

I also found this mailing list discussion that might relate to your problem:
https://lists.freebsd.org/archives/freebsd-net/2023-October/004104.html

Are you using IPv6 with PPPoE?

Kind regards,
Patrick
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: craig on October 26, 2023, 03:12:35 PM
Yes, I am using IPv6  :)

edit: I've uploaded the textdump file to my original post
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: Monviech (Cedrik) on October 26, 2023, 04:14:28 PM
In the Kernel panic I can see that it's caused by a page fault of the CPU.

The processes responsible seem to be:
- ether_demux (demultiplexes ethernet packets, looks into them, sees that they're IPv6 for example, and passes them to ip6_input)
- ip6_input (receives IPv6 packets and handles them, passing them to ip6_tryforward for example)
- ip6_tryforward (forwards IPv6 packets to the best path to its destination)

I could also see a lot of "fq-codel" messages, which show that you use traffic shaping. Maybe try to deactivate traffic shaping pipes for a while and see how it goes.

On first glance it doesn't look like PPPoE crashed the kernel. Its one of the above things that crashes, so maybe PPPoE calls them wrongly and that crashes the kernel. But somebody else might know better.

---------------------------------------------------------

(I use pppoe at home with a hardware opnsense and didn't experience crashes yet, also ipv6, but I have static prefixes so I can't be compared. I will try to reconnect it a few times later to see if I can make it crash  ;D
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: Patrick M. Hausen on October 26, 2023, 05:26:27 PM
Read the mailing list.  ;)

If an IPv6 interface goes away - like when PPPoE disconnects - a certain data structure is deallocated, while occasionally another thread tries to use it to forward a queued packet. Which of course causes a crash.

Looks like this is precisely the bug hitting our OP.

Kristof Provost and friends are currently discussing how to best fix it.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on October 26, 2023, 08:31:30 PM
@Patrick: your explanation seems to be right on the spot.
BTW: like @craig, I use PPPoE with IPv4 and an additional IPv6 /56 prefix.
The FreeBSD bug tracker says that kernels 12.0 - 13.2 (the current OPNsense kernel version) are affected and the main cause is the MPD5 daemon.
There was a lengthy discussion (https://forum.opnsense.org/index.php?topic=12828.0) about the same problem in the past, unfortunately it stopped without any final solution besides @schnipp preventing the crash with a modified script (https://forum.opnsense.org/index.php?topic=12828.msg69564#msg69564) of his own.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 26, 2023, 09:00:59 PM
I'm unsure the current discussion on the mailing list is the bug (half hoping this was only FreeBSD 14) but ip6_tryforward() is at least suspicious enough to take a closer look. Let me prepare a debug kernel tomorrow so we can get a core dump.

Backtrace for easier reference:

Tracing pid 0 tid 100025 td 0xfffffe00387fc020
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00e003f5f0
vpanic() at vpanic+0x182/frame 0xfffffe00e003f640
panic() at panic+0x43/frame 0xfffffe00e003f6a0
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00e003f700
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00e003f760
calltrap() at calltrap+0x8/frame 0xfffffe00e003f760
--- trap 0xc, rip = 0xffffffff80ea3574, rsp = 0xfffffe00e003f830, rbp = 0xfffffe00e003f8a0 ---
ip6_tryforward() at ip6_tryforward+0x274/frame 0xfffffe00e003f8a0
ip6_input() at ip6_input+0x5e4/frame 0xfffffe00e003f980
netisr_dispatch_src() at netisr_dispatch_src+0x295/frame 0xfffffe00e003f9d0
ether_demux() at ether_demux+0x159/frame 0xfffffe00e003fa00
ng_ether_rcv_upper() at ng_ether_rcv_upper+0x8c/frame 0xfffffe00e003fa20
ng_apply_item() at ng_apply_item+0x2bf/frame 0xfffffe00e003fab0
ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe00e003faf0
ng_apply_item() at ng_apply_item+0x2bf/frame 0xfffffe00e003fb80
ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe00e003fbc0
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe00e003fbf0
ether_nh_input() at ether_nh_input+0x1f2/frame 0xfffffe00e003fc50
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00e003fca0
ether_input() at ether_input+0x69/frame 0xfffffe00e003fd00
iflib_rxeof() at iflib_rxeof+0xbcb/frame 0xfffffe00e003fe00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00e003fe40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe00e003fec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe00e003fef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00e003ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e003ff30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 27, 2023, 11:32:30 AM
Promised kernel (make sure you are on 23.7.7 beforehand):

# opnsense-update -zkr dbg-23.7.7
# opnsense-shell reboot

After crash and automatic reboot there are vmcore.* files under /var/crash that I'd need to look at.

The debug kernel is a little more trigger-happy for panics, like opening system: tunables will crash the system. Try to avoid operating the GUI as much until the crash happens and then go back to regular kernel:

# opnsense-update -kf


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on October 28, 2023, 07:28:11 PM
Installed and waiting for the next crash. Naturally, today's reconnect did not crash.  ???
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 28, 2023, 08:10:28 PM
Fingers crossed. Thanks a lot for the help!


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 28, 2023, 08:11:43 PM
PS: @thatso can't directly confirm your panic is the same as the OP one so just wanted to mention that to level expectation
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: craig on October 31, 2023, 11:49:34 AM
Sorry I've been away for a few days.

I installed the debug kernel last night, but after doing OPNSense panics on boot.

I had to get things back up and running, so used the console port to select the previous kernel - I'll try and get the panic this evening to see if we can work around it.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 31, 2023, 11:50:59 AM
I was a bit afraid of that. Building with INVARIANTS in a release still crashes it pretty reliably in unrelated places. I'm not even sure I can do the debug thing without it due to other build requirements.


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: craig on October 31, 2023, 11:54:23 AM
I have just had a PPPoE crash (typical), and do have a 1.96GB vmcore.0 crash file from the "production kernel" if it would help?
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 31, 2023, 11:58:05 AM
Yes please. Do you have somewhere to stash it?


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 31, 2023, 11:58:30 AM
PS: Compressing it should help with size a lot.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: craig on October 31, 2023, 12:12:31 PM
I've popped it on WeTransfer - https://we.tl/t-QYw1eSa4pj let me know if there's any problems.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 31, 2023, 01:12:06 PM
Got it, thanks. Taking a look right away.


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on October 31, 2023, 02:20:50 PM
Just to also give a quick update: I had no crash on reconnect for the last three days and I don't want to provoke one so as not to change the conditions leading to the crash.
As said, sometimes the crashes happen for several days in a row and sometimes nothing happens for a week.  :o
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 31, 2023, 02:42:37 PM
Unfortunately I'm running into this gdb issue:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257036

I've checked all the gdb version we had down to 22.1 and all exhibit the same behaviour which either means the core file or the debug kernel file has a persistent issue.. it could be the size of the core file but that file size itself I wouldn't call problematic at first glance. :(


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: franco on October 31, 2023, 03:00:20 PM
Is there an info.0 file still on your end? I might need that, but not sure.

I can't get useful information out of the core, e.g.:

# dmesg -M vmcore.0
dmesg: _amd64_minidump_vatop: virtual address 0x0 not minidumped
dmesg: kvm_read: invalid address (0x0)

# ps -M vmcore.0
ps: invalid address (0xffffffff82d10000)

etc.


Cheers,
Franco
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: craig on October 31, 2023, 04:37:59 PM
I do - I backed up the entire folder

Dump header from device: /dev/gpt/swapfs
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 1956237312
  Blocksize: 512
  Compression: none
  Dumptime: 2023-10-31 10:41:57 +0000
  Hostname: OPNsense.home
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 13.2-RELEASE-p3 stable/23.7-n254818-f155405f505 SMP
  Panic String: page fault
  Dump Parity: 2194897932
  Bounds: 0
  Dump Status: good
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on November 04, 2023, 08:50:18 AM
And we have a winner.  ;)
After 6 days it finally crashed again.

@franco: I sent a PM regarding the dump files.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on November 09, 2023, 08:06:49 PM
This morning it crashed again (was still on 23.7.7_3) with this well known error:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address = 0x10
fault code = supervisor read data, page not present


I've already submitted the full crash log.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on November 11, 2023, 07:26:27 PM
Today I had another crash with the same error.  >:(

@franco: any insights on the debug logs yet?
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on November 16, 2023, 06:25:47 PM
The past few days had daily crashes and reboots BTW.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: craig on November 20, 2023, 11:25:17 AM
This is also still happening for me - I've been working through disabling functionality (shaper, jumbo frames etc) to try and figure it out, but it's a slow process.

It does look like IPv6 is going to be my next target though, as `ip6_tryforward()` is mentioned in the trace.

Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address = 0x10
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80ea3764
stack pointer         = 0x28:0xfffffe00e013eca0

frame pointer         = 0x28:0xfffffe00e013ed10

Fatal trap 12: page fault while in kernel mode
cpuid = 5; code segment = base 0x0, limit 0xfffff, type 0x1b
apic id = 05
fault virtual address = 0x10
fault code = supervisor read data, page not present
= DPL 0, pres 1, long 1, def32 0, gran 1
instruction pointer = 0x20:0xffffffff80ea3764
processor eflags = interrupt enabled, resume, stack pointer         = 0x28:0xfffffe00e0143ca0
IOPL = 0
current process = 12 (swi1: netisr 6)
trap number = 12
frame pointer         = 0x28:0xfffffe00e0143d10
code segment = base 0x0, limit 0xfffff, type 0x1b
panic: page fault
cpuid = 6
time = 1700089902
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e013ea60
vpanic() at vpanic+0x151/frame 0xfffffe00e013eab0
panic() at panic+0x43/frame 0xfffffe00e013eb10
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00e013eb70
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00e013ebd0
calltrap() at calltrap+0x8/frame 0xfffffe00e013ebd0
--- trap 0xc, rip = 0xffffffff80ea3764, rsp = 0xfffffe00e013eca0, rbp = 0xfffffe00e013ed10 ---
ip6_tryforward() at ip6_tryforward+0x274/frame 0xfffffe00e013ed10
ip6_input() at ip6_input+0x5e4/frame 0xfffffe00e013edf0
swi_net() at swi_net+0x12b/frame 0xfffffe00e013ee60
ithread_loop() at ithread_loop+0x25a/frame 0xfffffe00e013eef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00e013ef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e013ef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on January 06, 2024, 09:12:42 PM
Almost two months later ... any findings on this issue or on the crash logs I've sent @franco?
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: thatso on January 13, 2024, 12:07:17 AM
It seems this bug was finally fixed upstream (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272319) for the v14 kernel. Any chance that this will be included in OPNsense 24.1?
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: Drohne on May 08, 2024, 10:51:30 PM
The problem is still persistent in

FreeBSD 14.1-STABLE #2 stable/14-n267607-7e10c2d27a53: Sat May  4 08:33:15 CEST 2024 amd64

and I guess OPNsense will hit the fan when reaching the base of FreeBSD 14.
Title: Re: Crashes when reconnecting PPPoE repeatedly
Post by: Patrick M. Hausen on May 08, 2024, 11:02:13 PM
There's a current issue in the upstream bugtracker (from January):

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276294