OPNsense
  • Home
  • Help
  • Search
  • Login
  • Register

  • OPNsense Forum »
  • Archive »
  • 17.7 Legacy Series »
  • [SOLVED][Fix included in 17.7.1] PPPOE Crash
« previous next »
  • Print
Pages: 1 [2] 3 4

Author Topic: [SOLVED][Fix included in 17.7.1] PPPOE Crash  (Read 41296 times)

jwe

  • Newbie
  • *
  • Posts: 30
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #15 on: August 17, 2017, 09:52:59 pm »
Here is the output from the test1-image, imported config from 17.1:


I also have the latest BIOS version.
Logged

odites999

  • Newbie
  • *
  • Posts: 35
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #16 on: August 19, 2017, 12:17:56 pm »
With the Test1 image, the system also crashes (same error than in images posted by jwe).
With the Test2 image, the system does not crash.


Regards,
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17706
  • Karma: 1618
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #17 on: August 20, 2017, 11:16:18 am »
Hi guys,

Alright, that confirms it's not something we did in our code per se... It's from the import of the host-uniq patch from here:

https://reviews.freebsd.org/D9270

Can you tell me how your "service" field is filled out? If it is not filled out, what happens when you set a bogus string like "foo"?

We'd also be looking at a "|" symbol use or anything odd / non-standard about that.


Thanks so far,
Franco
Logged

odites999

  • Newbie
  • *
  • Posts: 35
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #18 on: August 20, 2017, 04:50:45 pm »
Hi franco,

My service name is (and has always been) empty. I've tried changing it to different lengths with no problem. The connection stays alive.


Regards,
« Last Edit: August 20, 2017, 07:23:50 pm by odites999 »
Logged

jwe

  • Newbie
  • *
  • Posts: 30
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #19 on: August 20, 2017, 11:00:32 pm »
Quote from: odites999 on August 20, 2017, 04:50:45 pm
Hi franco,

My service name is (and has always been) empty. I've tried changing it to different lengths with no problem. The connection stays alive.


Regards,

I set it to "mnet" in the 17.1 install.
PPPoE then reconnects and is working as before.

But when booting from the 17.7 usb stick (Test Image 1) then i get the same error as in my screenshots after configuration import and getting the network devices ready.
Logged

odites999

  • Newbie
  • *
  • Posts: 35
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #20 on: August 21, 2017, 07:22:01 am »
Hi,

My test with system names has been with the Test2 image, the only one that does not crash, in my case (empty system name).


Regards,
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17706
  • Karma: 1618
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #21 on: August 21, 2017, 06:21:21 pm »
Ok, so we need to roll back on that patch and start asking the authors for help in solving this mystery.

The older kernel will work for 17.7 if you can install it:

# opnsense-update -ikr 17.1.9 -n "17.1\/sets"
# /usr/local/etc/rc.reboot

The working image is also ok to install. The kernels are the same and upgrades will work as soon as 17.7.1 is out.


Cheers,
Franco
Logged

jwe

  • Newbie
  • *
  • Posts: 30
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #22 on: August 21, 2017, 10:00:07 pm »
Quote from: franco on August 21, 2017, 06:21:21 pm
Ok, so we need to roll back on that patch and start asking the authors for help in solving this mystery.

The older kernel will work for 17.7 if you can install it:

# opnsense-update -ikr 17.1.9 -n "17.1\/sets"
# /usr/local/etc/rc.reboot

The working image is also ok to install. The kernels are the same and upgrades will work as soon as 17.7.1 is out.


Cheers,
Franco

Okay.. what are the next steps now?
Are you contacting the author?

Edit:
I could reproduce the problem on my main PC within a hyper-v VM running pfsense 17.7

I also captured the packages with wireshark if it could help futher identify the problem.
But this makes the problem hardware-independent, as my main pc has an intel nic, mobo from gigabyte

Edit2:

I did some research on the patch at https://reviews.freebsd.org/D9270 (as far as my knowledge is going...)-

It seems like it not only patched/added the host-uniq feature but also added some additional parsing for the PADM messages.

As i can see in ng_pppoe_rcvdata_ether(), the way how a NULL PADI tag gets treated changed dramatically, before they called
Code: [Select]
CTR1(KTR_NET, "%20s: PADI w/o Service-Name",__func__);
LEAVE(ENETUNREACH);

but now they just to some new indroduced
Code: [Select]
if (tag==NULL)tag=&sntag;
which i would assume to be the problem.

« Last Edit: August 21, 2017, 11:12:41 pm by jwe »
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17706
  • Karma: 1618
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #23 on: August 22, 2017, 06:32:41 am »
Good morning,

The PCAP of the trace would certainly help. Something went wrong in the ng_pppoe_rcvdata_ether() function, but from review it looked like there was nothing conceptually wrong with the &sntag change which just wants to force a match of the service to the first service it finds. At least there was no obvious problem with it in the lines changes (that other reviewers could have caught as well).

You are right it's not hardware independent, that was ruled out with the test1/test2 images. It looks provider-specific and/or PPPoE server specific (setup).

We can try to revert parts of the patch to find the issue of course, I will involve the author from that review later today. :)


Cheers,
Franco
Logged

odites999

  • Newbie
  • *
  • Posts: 35
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #24 on: August 23, 2017, 07:55:38 am »
Hi!,

I have installed the test2 image (I made all the tests in "live mode" and, so far, everything is working OK.


Regards,
« Last Edit: August 23, 2017, 12:23:15 pm by odites999 »
Logged

jwe

  • Newbie
  • *
  • Posts: 30
  • Karma: 4
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #25 on: August 23, 2017, 11:24:46 pm »
Ok,
i think i have identified the problem.

What i have seen in the packetcapture:
Router sends PADI (Offer) | 585 in image
AC(1) sends PADO to Router |586
Router sends PADR to AC(1)|587
AC(1) send PADS to Router |588
=====Session is established, i also already have an ip via pppoe device=====
AC(2) sends PADO to router|589
=====Router crashes====


Thats what i have seen in the wireshark log.

AC(1) and AC(2) are sending the PADR's but the one from AC(2) is coming really late... as far as i could seen always after i have an established session with AC(1)

I can differntiate both AC's by their MAC-address.

So i blocked the MAC-address from the second AC on my switch.

Et voilà, problem solved.

I have attached a screenshot of the relevant captures.

so. by the RFC, if there are multiple AC's the client should be able to switch between them.
Possible workarounds for opnsense are, in my view:
1. a diagnostic 'pppoe -A' which sends the PADI to the ether and list the possible AC's
2. a possibility to select the AC to use in the configurator (for pppoe its parameter -C)

both commands are from https://www.freebsd.org/cgi/man.cgi?query=pppoe

hope that helps.




Edit:
I also noticed that the AC(2) (Cisco whatever by its MAC) has some more Vendor Specific PPPoE Tags(Cirguit ID and Remote ID, the Remote ID also contains my name in its value... 0o)
« Last Edit: August 23, 2017, 11:28:49 pm by jwe »
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17706
  • Karma: 1618
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #26 on: August 24, 2017, 08:37:45 am »
Hi jwe,

Thanks for this, I will push it to the FreeBSD bug tracker today.

We don't use the "pppoe" command, but from the looks of it you should be able to prevent the crash by filling out the appropriate AC value that you see being advertised on the wire for AC(1). That should also make the MAC block obsolete.

In the provider field, simply fill with "ac4.nue3\", the trailing backslash is important. The Host-Uniq field must be empty.


Cheers,
Franco
« Last Edit: August 24, 2017, 08:55:54 am by franco »
Logged

alexdupre

  • Newbie
  • *
  • Posts: 3
  • Karma: 1
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #27 on: August 24, 2017, 08:42:20 am »
Hi, I'm the author of the patch.

It'd be indeed useful to have the full ethernet dump. I've read multiple hypothesis about the cause of the issue, but few facts supporting them. The last one seems to be multiple PADO answers from different Access Concentrators. I don't exclude at 100% that this may be the issue (it's not a common scenario), but in theory it should be handled correctly (second PADO is ignored, /* Multiple PADO is OK. */ in the code) and my patch doesn't touch that part, so I'd expect the same behavior on a clean FreeBSD installation.

Instead of blocking the AC via its MAC address, you can configure the PPPoE connection to just accept a specific AC-Name. Since blocking one AC seems to have fixed the issue, it'd be interesting to see if blocking the another one produces the same result. And also to know the behavior with the same conditions (multiple PADO) on a clean pppoe module.
Logged

alexdupre

  • Newbie
  • *
  • Posts: 3
  • Karma: 1
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #28 on: August 24, 2017, 09:34:23 am »
I think to have found the issue. Try changing the following code in pppoe_finduniq function:

Code: [Select]
        /* Cycle through all known hooks. */
        LIST_FOREACH(hook, &node->nd_hooks, hk_hooks) {
                /* Skip any nonsession hook. */
                if (NG_HOOK_PRIVATE(hook) == NULL)
                        continue;
                sp = NG_HOOK_PRIVATE(hook);
                if (sp->neg->host_uniq_len == ntohs(tag->tag_len) &&
                    bcmp(sp->neg->host_uniq.data, (const char *)(tag + 1),
                     sp->neg->host_uniq_len) == 0)
                        break;
        }

with

Code: [Select]
        /* Cycle through all known hooks. */
        LIST_FOREACH(hook, &node->nd_hooks, hk_hooks) {
                /* Skip any nonsession hook. */
                if (NG_HOOK_PRIVATE(hook) == NULL)
                        continue;
                sp = NG_HOOK_PRIVATE(hook);
                /* Skip already connected sessions. */
                if (sp->neg == NULL)
                        continue;
                if (sp->neg->host_uniq_len == ntohs(tag->tag_len) &&
                    bcmp(sp->neg->host_uniq.data, (const char *)(tag + 1),
                     sp->neg->host_uniq_len) == 0)
                        break;
        }

Franco, can you create an image or a module with the above change so that jwe can test it, please?
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17706
  • Karma: 1618
    • View Profile
Re: [NOT-SOLVED,WTF!?] PPPOE Crash
« Reply #29 on: August 25, 2017, 08:44:28 am »
Hi Alex,

Great to see you here, thank you. I'll prepare an image tonight and we let you know how that goes. :)


Cheers,
Franco
Logged

  • Print
Pages: 1 [2] 3 4
« previous next »
  • OPNsense Forum »
  • Archive »
  • 17.7 Legacy Series »
  • [SOLVED][Fix included in 17.7.1] PPPOE Crash
 

OPNsense is an OSS project © Deciso B.V. 2015 - 2024 All rights reserved
  • SMF 2.0.19 | SMF © 2021, Simple Machines
    Privacy Policy
    | XHTML | RSS | WAP2