OPNsense Forum

Archive => 18.1 Legacy Series => Topic started by: schnipp on February 11, 2018, 02:46:04 pm

Title: [isolated: see #91] PPPoE reconnect loop
Post by: schnipp on February 11, 2018, 02:46:04 pm
Hi all,

I am new here in this forum. First, I would like to thank all the guys involved in releasing Opnsense 18.1. You did a really great job. Unfortunately, I have a problem with my PPPoE connection on the WAN interface. I read a lot to get my problem solved, but no solution yet :'(. Both in the Opnsense forum and some other FreeBSD forums similar problems are discussed, but I am not sure whether they are coincident.

After booting the system the PPPoE WAN connection works fine. But after interruption of the connection (e.g. dropping the connection after 24h like my ISP forces, other failures etc.) the system does not get a new stable PPPoE connection.

I tracked this issue down to packet capture. In general my ISP sends "PPP LCP echo request" packets every 10 seconds which the Opnsense answers immediately. But after interruption and reestablishing the PPPoE connection, the Opnsense does not answer incoming "PPP LCP echo request" packets anymore. After the fifth unanswered echo request my ISP drops the PPPoE connection. Afterwards the Opnsense tries to reconnect and runs in the infinite reconnection loop.

The only solution known to me is to reboot the system.

References:

Title: Re: PPPoE reconnect loop
Post by: kug1977 on February 11, 2018, 03:17:11 pm
Hi,

I'm on OPNsense 18.1.2 and I'm facing the same problem. Solution so far is the cron reboot every day.

Kind regards,
Kay-Uwe
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 12, 2018, 09:39:19 pm
Hi Kay-Uwe,

kann you tell me, what kinnd of configuration (e.g. modem) do you have on your WAN site?

I have a Fritzbox7412 in native bridge encapsulation mode and allow PPPoE passthrough. So the Opnsense sends PPPoE packet to the Fritzbox which encapsulates them into VLAN7 before forwarding to the BRAS.

Title: Re: PPPoE reconnect loop
Post by: franco on February 13, 2018, 03:04:08 pm
Do you guys have IPS enabled?


Cheers,
Franco
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 13, 2018, 04:07:52 pm
Heh :)
Just saw this.
@Franco, I guess I'm not alone :)
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 13, 2018, 05:10:37 pm
Do you guys have IPS enabled?

Hi Franco,

currently, I have still a basic setup without IDS/IPS installed. Details of my setup…

Furthermore, I have to use a self-compiled NIC driver from Intel, because the NICs of my Board (Intel Atom C3000 Series; Supermicro A2SDi-4C-HLN4F) are not supported by FreeBSD 11.1. But the NICs themselves run seamlessly.

The next steps I want to try are…

Kind regards,
schnipp
Title: Re: PPPoE reconnect loop
Post by: franco on February 13, 2018, 05:36:30 pm
I was talking to another user today and we're handing out a custom kernel tomorrow to test a theory as we have one change that has not made it to FreeBSD yet...

https://reviews.freebsd.org/D9270


Cheers,
Franco
Title: Re: PPPoE reconnect loop
Post by: glasi on February 13, 2018, 09:47:14 pm
Hello all,

I just started with my own OPNsense setup and wanted to give some feedback.

So far everything works like a charm. Luckily, there are also no problems with PPPoE. On the WAN side I use the fiber optic bridge modem provided by my ISP Deutsche Telekom.

Regards,
glasi

Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 13, 2018, 10:44:34 pm
Good to know some ISPs PPPoE link is ok, some are struggling :)
Thanks and welcome to OPNsense!
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 13, 2018, 10:45:37 pm
Btw, if you disconnect the WAN network cable and reconnect after a few seconds, is the PPPoE working again? If not, reboot to fix :)
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 14, 2018, 09:36:24 pm
Today at midday my Internet stopped working and my Opnsense entered the infinite reconnection loop. Maybe my ISP initiated the termination of the connection (24 hour reconnection?)

Anyhow, it was a good moment to do some more investigation in that situation. So I did some packet captures in parallel on three network interfaces:


Afterwards I rebooted the machine and in parallel started packet capturing of the two interfaces of the DSL-Modem.

Analyzing the capture files shows nothing really strange. The PPPoE packet forwarding between the modem interfaces looks fine. Only one reconfiguration request following a termination info from the ISP is not forwarded back to the Opnsense. This is ok, because there is no valid PPP-Session that moment.

The only difference in communication (1. connection after reboot and 2. reconnection loop) I can see is, that in the 2. case the Opnsense is not answering "LCP echo reply" packets sent by the ISP.

Presumably, there could be a timing issue or race condition somewhere in the PPPoE stack.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 14, 2018, 11:38:28 pm
Perhaps this helps you as well, until this is fixed:
https://forum.opnsense.org/index.php?topic=7316.0
Title: Re: PPPoE reconnect loop
Post by: kug1977 on February 15, 2018, 11:45:36 am
Hi,

>Do you guys have IPS enabled?
No, not enabled.

>Btw, if you disconnect the WAN network cable and reconnect after a few seconds, is the PPPoE working again?
No, it needs a restart after.

Setup is:
Deutsche Telekom VDSL 50
Draytek Vigor 130 in PPPoE path-through /bridge mode
APU1D with OPNsense 18.1.2 on eth0

Kind regards,
Kay-Uwe
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 16, 2018, 12:13:48 am
Well... I had a few adventures today with this. It has something to do with FreeBSD 11.1, because with OPNsense 17.7.12 if PPPoE is lost and then reconnected without a reboot, I get a new IPv4 - but IPv6 is lost.

With OPNsense 18.1 which is based on FreeBSD 11.1, only if I restart configd I get reconnected, but for some reason only for a few seconds.. then I get disconnected again and that's it, c'est fini until the next reboot.

So.. my automated method to reboot is the only thing that permanently brings back the PPPoE connection (so far).
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 16, 2018, 08:33:26 am
With OPNsense 18.1 which is based on FreeBSD 11.1, only if I restart configd I get reconnected, but for some reason only for a few seconds.. then I get disconnected again and that's it, c'est fini until the next reboot.

Yet it does not affect all PPPoE users, me for example.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 16, 2018, 09:59:23 am
I suppose.. otherwise more reports would have been filed about this.
But if RDS (the ISP) is affected, at least half of Romania is :)
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 16, 2018, 10:01:13 am
Also I've just been checking, no pppoe issues across the road have been reported, well not for some time.
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 16, 2018, 10:04:12 am
OK, so what are they doing that no-one else appears to be doing, apart from breaking opnsense.

Is it possible to get a wireshark log of opnsense and one of an ISP supplied router?

EDIT:

I can see @schnip posted the differences... even stranger.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 16, 2018, 01:19:18 pm
It's possible for the OPNsense box, of course, no way for the ISP's GPON device. That piece of crap is only good as a fiber to ethernet interface. Anything else is considered rocket science by the firmware.. and it doesn't understand rocket science. And to make things worse, there's no way you, as an end user, upgrade the firmware of these devices, otherwise (maybe - big maybe) you could have a chance to look at the code...

But.. there's no need to actually do any of that, because with OPnsense 17.7.12 this is working relatively fine (better anyways), so someone with appropriate skills and setup could compare/debug the PPPoE code. I have the setup and willing to give access to my OPNsense box if there's somebody with skills to debug...
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 16, 2018, 06:22:28 pm
OK, so what are they doing that no-one else appears to be doing, apart from breaking opnsense.

Is it possible to get a wireshark log of opnsense and one of an ISP supplied router?

EDIT:

I can see @schnip posted the differences... even stranger.

Hi all,

yes, I have some logs. When I am back in my control center :D (beginning of next week) I can post the wireshark logs of both scenarios (reconnect issue and fresh reboot) I have taken so far.

Title: Re: PPPoE reconnect loop
Post by: schnipp on February 16, 2018, 06:30:42 pm
[...] so someone with appropriate skills and setup could compare/debug the PPPoE code. I have the setup and willing to give access to my OPNsense box if there's somebody with skills to debug...

I hope the issues reside in user space daemon. This would be much easier for debugging. Actually, I am not aware of the whole architecture the PPPoE (stack) relies on. But, we can keep debugging the PPPoE stack in mind for later, maybe after log review or lib call tracing!?
Title: Re: PPPoE reconnect loop
Post by: mbosner on February 17, 2018, 09:45:30 pm
Seems that i have the same problem. I have a pppoe session but i do not get an IP address. That problem appeared today after a configuration change of my provider. With my fritz.box everything works well. I will try to debug that tomorrow.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 17, 2018, 09:57:54 pm
What country are you guys from and who's your ISP?
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 17, 2018, 11:36:29 pm
Seems that i have the same problem. I have a pppoe session but i do not get an IP address. That problem appeared today after a configuration change of my provider. With my fritz.box everything works well. I will try to debug that tomorrow.

Can you post your ppps.log, it's in /var/logs -

if you are concerned, check it first and blank out any IP addresses and authname passwords that might be in there before you post it.
Title: Re: PPPoE reconnect loop
Post by: mbosner on February 18, 2018, 02:11:27 am
Germany
https://www.wilhelm-tel.de/privatkunden/

I will debug that tomorrow and paste the logfile.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 07:38:47 am
Here's mine when things are working (after a reboot).
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 07:59:27 am
And here's one when I disconnect the PPPoE interface from the GUI and then try to reconnect - which triggers the loop.
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 10:51:18 am
And can we have hardware specs and any other stuff running, IDS etc.

Looking back at the forums across the road similar problems to this WERE reported, but not with FreeBSD 11.1.

This 'caught fatal signal TERM' is weird as everything looks fine - apart from a NAK and a resend - up to that point.



Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 11:17:01 am
HW specs are in my signature :)

Interfaces: WAN (PPPoE on em0 - 82574), LAN1 (igb0 - I221), LAN2 (em1 - I219-V), VPN (ovpns1)
Unbound -> ON
DNS server: LAN client, advertised by DHCPv4 to DHCP clients or by AD DC
Aliases: URL Table (IPs), Host(s)
IDS+ IPS -> ON
OpenVPN -> 1 server
DHCPv4: LAN1, LAN2
DHCPv6: none
IPv6 conf: https://forum.opnsense.org/index.php?topic=7267.0
No proxies

And a few "utility class" plugins (LE client, monit, stuff like this)

Did I miss something relevant?
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 11:51:07 am
OK, If you've not already done this, then  begin the processes of elimination. This is what I would now do.

Back up the config and create a very simple opnsense, no IDS no IPS no VPN etc, simple as you can make it and try that, and see if it has the same issue.Then start building again from there until the problem re-appears
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 11:57:47 am
Just to rule something out are you're running in a VM?
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 12:13:21 pm
Nope, OPNsense runs on a physical machine.
On that particular machine, I can't do a clean setup for a number of reasons. What I could try is to set up another physical machine, but I'm uncertain if the results will be relevant (different hw, different drivers)...
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 12:49:28 pm
Not one of my finest work, but this will be the test setup :-)
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 02:24:37 pm
Nah... looks pretty reasonable to me.  8)

We're only doing what your tag suggests.

Now, what I can do is that i have a Qotom i5 based unit which is fine, I also have two APU's, one APU1 and one APU2, I will configure them and try them with my ISP using pppoe dhcp6 rather than static and see if I get any issues. Might take a day or two as I have to do it when my better half is not working and online.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 02:38:59 pm
So.. new setup, clean install 18.1, updated to 18.1.2, no import of backups.

Here's what I found so far:

1. WAN IPv4 only
- no reconnect loops after multiple disconnects
- LAN client has internet connectivity

2.  WAN IPv4 + IPv6 (happened once)
- IPv6 exactly like here https://forum.opnsense.org/index.php?topic=7267.0
- no reconnect loops, but my only LAN client, which is a laptop directly connected to the LAN interface of the OPNsense box, will lose internet connectivity
- even so, pinging and stuff work on the OPNsense box and I do receive IPv4 + IPv6 from the ISP

3. WAN IPv4 + IPv6 (happened most of the times)
- no reconnect loops
- LAN client has internet connectivity
- WAN loses IPv6

4.  WAN IPv4 + IPv6
- IDS + IPS enabled (with all rules set to 'alert' only)
- no reconnect loops
- LAN client has internet connectivity
- WAN loses IPv6

5.  WAN IPv4 + IPv6
- IDS + IPS enabled (with all rules set to 'drop')
- no reconnect loops
- LAN client has internet connectivity
- WAN loses IPv6

6. WAN IPv4 + IPv6
- OpenVPN settings imported from backup
- no reconnect loops
- LAN client has internet connectivity
- WAN loses IPv6
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 03:26:47 pm
So what we are seeing now with IDS & IPS is NO loops but loss of IPv6?

Well that's different, not sure how to account for that. However, look at the dhcp6c logs and see if anything strange is happening there. Do you have 'Prevent Release' and 'Use IPv4 Connectivity' set?
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 03:31:34 pm
(I'll keep updating my previous post with my findings)

Use IPv4 is checked (WAN won't receive an IPv6 otherwise), and prevent release is right now unchecked.

But i had the IPv6 loss with my other box as well, it's not new.. not in my case at least.
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 03:44:46 pm
OK, what's appearing in dhcp6c logs, is it being signalled to exit?
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 05:03:14 pm
Ok, finally nailed it!

The culprit for both the PPPoE reconnect loop and the IPv6 loss was (in my case):

- custom MTU (1492)
- custom MSS (1452)

... both configured on the WAN of course.

Although both are correct for PPPoE (as far as I know), manually configuring them caused the PPPoE loop and the IPv6 loss. After deleting both, the loop is gone and IPv6 is also back :)

I'm not sure if this is a bug or a misconfiguration.

Thank you for all your help marjohn56!
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 05:05:41 pm
Excellent!! :)

Never touched my MTU or MSS, I let the system work it out.

Now let's see if its the same thing for others too.

Title: Re: PPPoE reconnect loop
Post by: mimugmail on February 18, 2018, 05:12:53 pm
A fixed value could let to a CONFNAK on the provider LNS when the client insist of the value
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 05:59:03 pm
A fixed value could let to a CONFNAK on the provider LNS when the client insist of the value

So if on PPPoE best to leave it to work it out itself?
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 06:24:20 pm
A fixed value could let to a CONFNAK on the provider LNS when the client insist of the value

Even if the fixed values are the ones the ISP would require anyways?
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 18, 2018, 06:47:20 pm
You don't want to believe what the ISP says.  :)
Title: Re: PPPoE reconnect loop
Post by: mimugmail on February 18, 2018, 08:18:52 pm
A fixed value could let to a CONFNAK on the provider LNS when the client insist of the value

Even if the fixed values are the ones the ISP would require anyways?

If I remember correctly I had the same issue with a Cisco Router as LNS with an early 15.0 release and Client was also Cisco, so this might bei unrelated to OPNsense
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 18, 2018, 08:26:22 pm
Well, good to know anyway that custom MTU/MSS with certain PPPoE links are not a match made in heaven :P
Hopefully, the others with the same loops have similar configs and clearing them will fix the issues for their link as well.
Title: Re: PPPoE reconnect loop
Post by: nasq on February 19, 2018, 07:50:25 am
Unfortunately, no.

I never set custom values to MTU/MSS and regularly facing those problems.

Since 17.7 my IPv6 connection broke after 24h reconnect. I worked around it setting an automated reboot. But now, since a clean 18.1 install, I also get those infinite pppoe loops.
And often, out of the blue my pppoe connection stops working.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 19, 2018, 12:20:07 pm
Hmm... sorry to hear this.
Do you happen to remember what custom settings have you set for the WAN?
Maybe you could try reconfiguring PPPoE, this is how I found out about my problem. It was painfully slow, but I managed to find the issue. I set things up step by step and after each step, I tried to reconnect.
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 19, 2018, 10:33:36 pm
yes, I have some logs. When I am back in my control center :D (beginning of next week) I can post the wireshark logs of both scenarios (reconnect issue and fresh reboot) I have taken so far.

Here are my wireshark logs. I monitored the DSL interface of my DSL modem with pppoe packet forwarding. In this context nearly the same like a DSL modem in bridge mode.

1. Wireshark log after rebooting the machine (pppoe_dial_after_reboot_vcc0.jpg)

After rebooting the machine everything works fine with a stable DSL connection. However, we can see some strange behaviour (black colored packets) in this trace. The Opnsense sends multiple PADI requests interleaved even the ISP has already sent an offer (PADO). This situation should not occur but could be caused by multiple pppoe daemon (mpd5) instances running the same time or some issues in the daemon's FST. But after configuration has finished everything works fine and the Opnsense immediately replies every "LCP Echo Request" (packet nr. 30 ff.).


2. Wireshark log after connection dropped (pppoe_dial_after_reconnect_vcc0.jpg)

After connection drops and the system tries to reconnect to the ISP the whole configuration process looks the same like in the scenario above but without the strange black colored packets. The strange behaviour, which leads to the reconnection loops is shown in line 27. The ISP again sends "LCP Echo Request" packets which are not answered by the Opnsense. After the third lost packet the ISP thinks the pppoe client is not alive anymore and sends a termination request which is oddly answered by the Opnsense. After finishing the termination procedure the system tries to reconnect again (PADI request) and so forth.

Maybe the reason could be a timing issue (caused by a race condition?) or other issues in the FST. But this is only an uncertain assumption.
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 19, 2018, 10:40:01 pm
Perhaps this helps you as well, until this is fixed:
https://forum.opnsense.org/index.php?topic=7316.0

Thanks, I will check this out the next days. In the other thread you mentioned the system will reboot if pppoe interface goes down and never gets a new IPv4 address. In the reconnection loop of my system I will get a new IPv4 address, but only for almost 30 seconds.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 20, 2018, 07:19:03 am
You're welcome :)
I've updated the script, had some design flaws :P
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 21, 2018, 02:52:00 pm
I did some more investigation in this topic and increased the logging of the mpd daemon to get some more information of the ISP's LCP echo probing. I found out that the daemon processes echo request packets and itself claims to send out corresponding reply packets. Unfortunately, the echo reply packets are not seen on the WAN interface :-(
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on February 21, 2018, 03:25:55 pm
Interesting that in elektroinside problems disappeared when he rebuilt his system from scratch.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on February 21, 2018, 04:59:46 pm
Interesting that in elektroinside problems disappeared when he rebuilt his system from scratch.

It wasn't the rebuild that fixed my loop, it just helped me find its source :)
I started adding major features to the new built and I had no loops while doing that. I lost IPv6, but no loops. Then I imported my previous backup and the loops reappeared. This made me wonder what in that backup triggered the loop. Then I started deleting/disabling/uninstalling stuff until I found that in my case, the custom MTU/MSS was the triggering factor... I don't necessarily think that this is the only trigger, as I had those custom MTU/MSS values while rebuilding the box (if I remember correctly), but I had no loops, not until the import. So I think that the MTU/MSS is just a part of a combination of factors that eventually causes the loop. Eliminating this one factor was enough for me, but might not be for others..
Title: Re: PPPoE reconnect loop
Post by: mimugmail on February 21, 2018, 06:16:55 pm
I did some more investigation in this topic and increased the logging of the mpd daemon to get some more information of the ISP's LCP echo probing. I found out that the daemon processes echo request packets and itself claims to send out corresponding reply packets. Unfortunately, the echo reply packets are not seen on the WAN interface :-(

IPS enabled?
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 21, 2018, 07:58:50 pm

IPS enabled?

Currently, I am running a nearly plain Opnsense system in testing mode. IPS is not yet installed. Only a few plugins like Dyndns and Arp-scan are used. IDS/IPS and Webproxy filtering are tasks for future after system stabilization.
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 21, 2018, 08:37:22 pm
People affected by this issue, can you please post the used NIC model and driver (incl. its version). Thanks.
Title: Re: PPPoE reconnect loop
Post by: schnipp on February 28, 2018, 08:47:48 pm
I investigated a little bit more to figure out the reason of the reconnection loops. What we already know, in some cases LCP echo request packets sent by the ISP seem not to be answered by the Opnsense. After three unanswered request packets my ISP thinks the PPP endpoint is not alive anymore and drops the PPPoE connection.

Initially after reboot everything works fine, but after interruption of the connection (e.g. 24h reconnect initiated by ISP) or some time in between, echo reply packets aren't seen anymore on the network interface. I downloaded the source code of the mpd5 (PPPoE daemon) an compiled it with some modfications for debugging (due to missing gdb). The daemon successfully receives echo request packets and immediately sends out an appropriate response (the sendto() function successfully returns without error code).

So, it looks like the daemon itself works fine. But I have to check whether the packets are sent over the correct network link. Furthermore, by snooping a netgraph node, I can see the echo reply packets sent by the daemon. But, during reconnection loop echo reply packets are delayed and oddly seen as a bunch of three packets. So the ISP won't receive the responses in time.

Packets are sent out via b0@mpd32168-lso (see netgraph in the attachment), and I tapped the netgraph at mpd32168-wan_link0-lt.

The next steps will be tapping more nodes in the netgrqaph and studying the log files of the mpd5 daemon.
Title: Re: PPPoE reconnect loop
Post by: nallar on March 06, 2018, 02:24:51 pm
I had a reconnect loop issue a while back where the modem interface would go up and down repeatedly.

I think there's a bug in rc.linkup after this commit:

https://github.com/opnsense/core/commit/fdc754e4261d333878549d1f43c980ae23a5f9ed

A static IPv4 address with V6 not configured will call interface_configure. Previously the empty($ip6addr) check would consider that to be a static address so it would not call  interface_configure.

My modem interface has only a V4 static address. Giving it a static V6 address resolved the problem.

Title: Re: PPPoE reconnect loop
Post by: franco on March 06, 2018, 03:14:07 pm
Very nice analysis, can you try https://github.com/opnsense/core/commit/267a086dc ?

# opnsense-patch 267a086dc


Thanks,
Franco
Title: Re: PPPoE reconnect loop
Post by: nallar on March 07, 2018, 03:13:35 pm
Applied the patch and changed IPv6 address back to none.

No reconnect loop yet, but it didn't always happen before so can't confirm that it is fixed.
Title: Re: PPPoE reconnect loop
Post by: franco on March 07, 2018, 04:50:27 pm
No rush, if you can let us know in a week or two. :)


Thanks,
Franco
Title: Re: PPPoE reconnect loop
Post by: schnipp on March 11, 2018, 04:14:00 pm
Within this thread users address different issues with PPPoE dial-up. Some have problems with IPv6 address assignment, others with keep-alive signaling to the ISP.

Unfortunately, with update from 18.1.2 to 18.1.4 my problem (keep-alive signaling to the ISP by use of PPP LCP echo request/reply) is not solved. Instead the problem got more worse because in case of re-establishing the PPPoE session the mpd5 daemon does not send any PPP LCP packets to the ISP anymore. So from now on, also PPP configuration requests are not answered.  :'(

My guess is, there could be some bugs in the mpd5 daemon. The source code is not high quality (not well documented, a lot of immediates deeply integrated in the source code in conjunction with a lot of pointer arithmetics) :( So, it is hard to debug on a productive system without gdb.

Does anybody know, what has changed in the daemons source code related to the Opnsense update mentioned above?

Title: Re: PPPoE reconnect loop
Post by: elektroinside on March 11, 2018, 05:08:10 pm
Franco said there will be several PPPoE improvements in the upcoming service releases. I'm confident most issues will get fixed. I have sent Franco some logs regarding these new changes via email, I guess if it's possible you can also jump in. I'm sure he's doing his best to get these done as soon as possible...
Title: Re: PPPoE reconnect loop
Post by: schnipp on March 11, 2018, 06:22:51 pm
[...]
I guess if it's possible you can also jump in. I'm sure he's doing his best to get these done as soon as possible...

That sounds good  :). I am still investigating with my self-compiled debug version of mpd5. I have to check whether the ppp netgraph node (see mpd32168-wan in reply #67) successfully forwards packets from hook bypass to link0.

 
Title: Re: PPPoE reconnect loop
Post by: elektroinside on March 12, 2018, 07:07:07 am
Has this been reported on github?
It is getting bigger (the thread) and I think it's time to open a ticket there... it will be easier for the devs to follow the progress. Also, not all of them are reading the forum.
I think it's a better idea than emailing logs. We can all contribute there...
Title: Re: PPPoE reconnect loop
Post by: mimugmail on March 12, 2018, 12:28:31 pm
Within this thread users address different issues with PPPoE dial-up. Some have problems with IPv6 address assignment, others with keep-alive signaling to the ISP.

Unfortunately, with update from 18.1.2 to 18.1.4 my problem (keep-alive signaling to the ISP by use of PPP LCP echo request/reply) is not solved. Instead the problem got more worse because in case of re-establishing the PPPoE session the mpd5 daemon does not send any PPP LCP packets to the ISP anymore. So from now on, also PPP configuration requests are not answered.  :'(

My guess is, there could be some bugs in the mpd5 daemon. The source code is not high quality (not well documented, a lot of immediates deeply integrated in the source code in conjunction with a lot of pointer arithmetics) :( So, it is hard to debug on a productive system without gdb.

Does anybody know, what has changed in the daemons source code related to the Opnsense update mentioned above?

You can revert to old version to check if it really was the change by franco:

opnsense-revert -r 18.1.2 mpd5
Title: Re: PPPoE reconnect loop
Post by: franco on March 12, 2018, 03:17:23 pm
It's not my change, it was FreeBSD and it was documented in 18.1.3...

https://github.com/opnsense/changelog/blob/master/doc/18.1/18.1.3#L49
Title: Re: PPPoE reconnect loop
Post by: mimugmail on March 12, 2018, 03:32:39 pm
Oh, I thought you said last time you recompiled it. Ok, then it's FreeBSD but the revert should still do the trick :)
Title: Re: PPPoE reconnect loop
Post by: schnipp on March 14, 2018, 07:59:06 pm
Has this been reported on github?
It is getting bigger (the thread) and I think it's time to open a ticket there... it will be easier for the devs to follow the progress. Also, not all of them are reading the forum.

No, it hasn't been yet reported. The last ticket I opened some time ago has not been paid any attention (see issue #1961). Thus, regarding this topic, I opened the discussion in the forum. But, we can try to move it to the issue tracker.

 
Title: Re: PPPoE reconnect loop
Post by: elektroinside on March 14, 2018, 08:07:14 pm
Please do. I am almost sure it was taken into consideration. At least the devs know PPPoE has issues. As I remember correctly, Franco was actively talking about these on the FreeBSD bugtracker. He also told me fixes/improvements are scheduled  :-)
Title: Re: PPPoE reconnect loop
Post by: schnipp on March 18, 2018, 02:17:41 pm
I have created a new bug report (#2267) https://github.com/opnsense/core/issues/2267 (https://github.com/opnsense/core/issues/2267)
Title: Re: PPPoE reconnect loop
Post by: elektroinside on March 19, 2018, 09:02:54 am
I have created a new bug report (#2267) https://github.com/opnsense/core/issues/2267 (https://github.com/opnsense/core/issues/2267)

Thanks. Let's see what progress we'll have.
Title: Re: PPPoE reconnect loop
Post by: mimugmail on March 19, 2018, 10:05:22 am
btw .. this weekend I got my new fiber dsl and I had a similar problem with a Cisco Router as client.
The reason was the Cisco was not able to tag VLAN7 on a SVI port (just unsupported), but when firing up debugging I read about ongoing LCP timeouts and thought of you.

@schnipp: Do you use VLANs for PPPoE? I can't imagine this is a general problem but perhaps something that with a reconnect mpd can't push into VLAN
Title: Re: PPPoE reconnect loop
Post by: mimugmail on March 19, 2018, 11:19:08 am
Another idea ... go to cli and edit:

/usr/local/opnsense/service/templates/OPNsense/IPFW/ipfw.conf

Search for rule 150 and add log option:

add 150 deny log layer2 not mac-type ip,ipv6

Then restart configd:

service configd restart

Via UI go to Traffic Shaper and hit "Reset".

Then back to CLI and type:

sysctl net.inet.ip.fw.verbose_limit=5



Perhaps you see something blocked in system.log or filter.log. Just a guess since it hit upon this rules which blocks everything and *could* be related to non ipv4/6
Title: Re: PPPoE reconnect loop
Post by: schnipp on March 20, 2018, 08:50:02 am
@schnipp: Do you use VLANs for PPPoE? I can't imagine this is a general problem but perhaps something that with a reconnect mpd can't push into VLAN

Yes, my PPPoE connection needs VLAN encapsulation (ID 7). But in my case this is not a task of opnsense as in my setup VLAN encapsulation is done by the DSL modem (Fritzbox 7412). PPPoE forwarding by the DSL modem works flawlessly, only incoming packets of unknown PPPoE sessions will be filtered.

I guess mpd is not aware of vlan tagging since tagging is presumably done in the kernel and seems to be transparent to netgraph nodes connected to the virtual interface.

Title: Re: PPPoE reconnect loop
Post by: schnipp on March 20, 2018, 08:57:55 am
[...]
Perhaps you see something blocked in system.log or filter.log. Just a guess since it hit upon this rules which blocks everything and *could* be related to non ipv4/6

I don't know if this makes sense, because I directly tap the netgraph hooks of the mpd daemon.
Does anybody know where exactly firewall filtering takes place in the whole network stack?
Title: Re: PPPoE reconnect loop
Post by: nallar on March 20, 2018, 01:35:30 pm
No rush, if you can let us know in a week or two. :)


Thanks,
Franco
Still seems to be working :)
Title: Re: PPPoE reconnect loop
Post by: Alphakilo on March 28, 2018, 09:51:38 pm
I did some more investigation in this topic and increased the logging of the mpd daemon to get some more information of the ISP's LCP echo probing. I found out that the daemon processes echo request packets and itself claims to send out corresponding reply packets. Unfortunately, the echo reply packets are not seen on the WAN interface :-(

Boom:

Code: [Select]
# tcpdump -i re1_vlan7 pppoes and ppp proto 0xc021
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on re1_vlan7, link-type EN10MB (Ethernet), capture size 262144 bytes
21:08:29.423673 PPPoE  [ses 0x6cd1] LCP, Echo-Request (0x09), id 228, length 10
21:08:29.423880 PPPoE  [ses 0x6cd1] LCP, Echo-Reply (0x0a), id 228, length 10
21:08:39.423718 PPPoE  [ses 0x6cd1] LCP, Echo-Request (0x09), id 229, length 10
# Saved settings for WAN interface, triggering a reconnect
21:08:39.621649 PPPoE  [ses 0x6cd1] LCP, Term-Request (0x05), id 3, length 6
21:08:43.378083 PPPoE  [ses 0x8758] LCP, Conf-Request (0x01), id 1, length 20
21:08:43.380642 PPPoE  [ses 0x8758] LCP, Conf-Request (0x01), id 1, length 18
21:08:43.390085 PPPoE  [ses 0x8758] LCP, Conf-Reject (0x04), id 1, length 8
21:08:43.390547 PPPoE  [ses 0x8758] LCP, Conf-Request (0x01), id 2, length 16
21:08:43.400199 PPPoE  [ses 0x8758] LCP, Conf-Ack (0x02), id 2, length 16
21:08:45.400987 PPPoE  [ses 0x8758] LCP, Conf-Request (0x01), id 3, length 16
21:08:45.410645 PPPoE  [ses 0x8758] LCP, Conf-Ack (0x02), id 3, length 16
21:08:46.369960 PPPoE  [ses 0x8758] LCP, Conf-Request (0x01), id 2, length 20
21:08:46.370666 PPPoE  [ses 0x8758] LCP, Conf-Ack (0x02), id 2, length 20
21:08:57.028608 PPPoE  [ses 0x8758] LCP, Echo-Request (0x09), id 1, length 10
21:09:07.028322 PPPoE  [ses 0x8758] LCP, Echo-Request (0x09), id 2, length 10
21:09:17.027982 PPPoE  [ses 0x8758] LCP, Echo-Request (0x09), id 3, length 10
21:09:27.032313 PPPoE  [ses 0x8758] LCP, Term-Request (0x05), id 3, length 6

Take note of the identifier (id 3) of the last two packets.
If the endpoint follows RFC1661 that means the Term-Request is a direct consequence of the (unanswered) Echo-Request.

I think I owe you a beer, schnipp.

I have a lot going on on my installation (18.1.5). IDS, VPNs, DHCPv4/6, unbound, ...
Title: Re: PPPoE reconnect loop
Post by: mimugmail on March 29, 2018, 07:07:23 am
I'd wonder what happens when you set LCP keepalive to be send every second?
Would this force OPN send Echo Requests instead of just receiving them?

Just an idea ...
Title: Re: PPPoE reconnect loop
Post by: schnipp on March 29, 2018, 09:52:44 am
I'd wonder what happens when you set LCP keepalive to be send every second?
Would this force OPN send Echo Requests instead of just receiving them?

Just an idea ...

In my configuration it is set by default (set link keep-alive 10 60). I tried to adjust these values in the past without success. I have not seen any keep-alive packets on wire sent out by the mpd daemon, presumably due to other incoming traffric (see http://mpd.sourceforge.net/doc5/mpd20.html).
Title: Re: PPPoE reconnect loop
Post by: marjohn56 on March 29, 2018, 10:00:55 am
@schnipp

Is this problem apparent on a 'simple' setup, i.e. no IPS, no VLAN, no VPN etc, just a basic get me to the WAN system? I use pppoe and have zero problems. OK, my system is all statics, but I need pppoe for the initial link up.
Title: Re: PPPoE reconnect loop
Post by: mimugmail on March 29, 2018, 10:09:50 am
I'd wonder what happens when you set LCP keepalive to be send every second?
Would this force OPN send Echo Requests instead of just receiving them?

Just an idea ...

In my configuration it is set by default (set link keep-alive 10 60). I tried to adjust these values in the past without success. I have not seen any keep-alive packets on wire sent out by the mpd daemon, presumably due to other incoming traffric (see http://mpd.sourceforge.net/doc5/mpd20.html).

I can add a field for this, but only if it works .. can you try to set it to 1 seconds? I'm not into it but I'd guess if you don't receive in given time the mpd starts sending self, if you set to 5 sec not really sure if this works but with 1 it should ... perhaps worth a try, but it's hard to debug from remote.
Title: Re: PPPoE reconnect loop
Post by: schnipp on March 29, 2018, 10:26:26 am
Of course, I can try it.
If somebody can tell me, I can alternatively modify the script/global config file with builds the mpd_wan.conf
Title: Re: PPPoE reconnect loop
Post by: schnipp on April 05, 2018, 06:21:39 pm
Over a longer period of time I noticed that this issue sometimes also occurs directly after rebooting the machine. So, it is not only related to a pppoe reconnect. Maybe it's a timing issue.
Title: Re: PPPoE reconnect loop
Post by: elektroinside on April 05, 2018, 06:26:32 pm
This never happened to me...
Title: Re: PPPoE reconnect loop
Post by: schnipp on April 15, 2018, 04:58:11 pm
I can add a field for this, but only if it works .. can you try to set it to 1 seconds? I'm not into it but I'd guess if you don't receive in given time the mpd starts sending self, if you set to 5 sec not really sure if this works but with 1 it should ... perhaps worth a try, but it's hard to debug from remote.

Can you tell me, where OpnSense stores the initial PPP configuration data which builds mpd_wan.conf file at system bootup?
Title: Re: PPPoE reconnect loop
Post by: franco on April 16, 2018, 03:37:49 pm
The data is inside /conf/config.xml ... the assembly takes place in /usr/local/etc/inc/interfaces.inc function interface_ppps_configure()


Cheers,
Franco
Title: Re: PPPoE reconnect loop
Post by: mimugmail on May 01, 2018, 06:38:33 pm
Today I implemented my own OPNsense at home with pppoe and got to the same error. DSL was online and ran into LCP echo timeout.

I did some further investigation and saw that under Interfaces - Assignment the WAN was bound to igb0_vlan7 which is perfectly fine when setting up a DSL, but there is also pppoe_vlan7 available, and after changing to this it's stable now.

No idea if this would fit your problems too but worth a look at assignments.
Title: Re: PPPoE reconnect loop
Post by: franco on May 02, 2018, 03:54:33 pm
pppoe0_vlan7, this sounds funky and not completely right... should be dead after reboot?
Title: Re: PPPoE reconnect loop
Post by: mimugmail on May 02, 2018, 04:17:02 pm
It's pppoe(igb9_vlan7)
Title: Re: PPPoE reconnect loop [fault isolated]
Post by: schnipp on May 06, 2018, 01:05:54 pm
I succeeded in isolating the fault  :)

The problem is a timing issue. The details I have described in the github bug tracker (https://github.com/opnsense/core/issues/2267#issuecomment-386871115 (https://github.com/opnsense/core/issues/2267#issuecomment-386871115)).

As a dirty work around I moved the ppp-linkup script to execute in the background to prevent blocking the mpd5 daemon process. So rename the the script "/usr/local/sbin/ppp-linkup" to "/usr/local/sbin/ppp-linkup2" and create a new ppp-linkup script in the same directory with the following content:

Code: [Select]
#! /bin/sh

nohup /usr/local/sbin/ppp-linkup2 "${1}" "${2}" "${3}" "${4}" "${5}" "${6}" "${7}" "${8}" "${9}" "${10}" &
exit 0

Warning: The work around is not well tested and could result in any side effects due to unknown dependencies to system states and process related synchronization issues.
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: franco on May 06, 2018, 07:01:09 pm
Not sure if we should talk here or in the GitHub issue... but only fair to follow up here as well.

Did your workaround work better than the test patch proposed in GitHub or are they virtually the same? It looks there is an underlying issue as well.


Cheers,
Franco
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: schnipp on May 06, 2018, 07:43:02 pm
As not everybody is reading on github we can continue here as well :-)

Currently, my work around works better than proposed patch at github. The patch seems to have problems with setting the default route (and or any services?) in case of a reconnect (without reboot).

But generally, the patch pursues the right way by outsourcing long time running tasks in the background.


Best regards,
  schnipp

Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: franco on May 06, 2018, 07:50:44 pm
Hi schnipp,

Okay, thanks, this is funky indeed. Let me sleep on it.


Cheers,
Franco
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: mimugmail on July 12, 2018, 08:43:07 am
Today I rebootet my firewall and was offline .. again a reconnect loop. I got a IPv6 address but not IPv4. Rebootet again, nothing changed. I edited some interfaces to force a reconnect, nothing changed. After some time I saw in Assignments it was again vlan7 assigned to WAN and not pppoe.
I changed it and then it worked again ... strange.

At one time I got the following in my ppps.log

http://dpaste.com/1JJY3JD
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: schnipp on July 12, 2018, 05:50:18 pm
Today I rebootet my firewall and was offline .. again a reconnect loop. I got a IPv6 address but not IPv4. Rebootet again, nothing changed.
[...]
At one time I got the following in my ppps.log

It is also a reconnect loop, but a different issue.
It looks like an invalid configuration, however it happened.
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: schnipp on August 16, 2018, 06:59:50 pm
Unfortunately, the planned bugfix moved to milestone 19.1. So, every update overwrites my workaround (modifications to /usr/local/sbin/ppp-linkup script). Is there a neat way, to prevent this?
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: franco on August 17, 2018, 09:12:03 am
I was under the impression https://github.com/opnsense/core/issues/2267#issuecomment-387167501 said that we do not have a workable solution just yet.

I don't think it's a good idea to push a fix into a release that we haven't fully understood yet. I'll be back in September to look into it. In any case, please keep prodding. Most work we do is prioritised and ordered by the amount of help and discussion from reporters. If your updates are missed, please prod again.


Cheers,
Franco
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: schnipp on August 20, 2018, 09:54:34 pm
Franco, you're totally right. Please don't misunderstand me.

I only looked for a hint to keep the WAN connection running in case of an update while forgetting to apply the workaround afterwards. I'll write a short script wich applies the workaround after rebooting the machine.
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: schnipp on January 29, 2019, 06:16:25 pm
Opnsense 19.1 will be released the next time. Thank you to all who made this happen. I was really happy that the problem with PPPoE reconnect loops will get solved in this version. Unfortunately, a bugfix has been postponed again.

Without a bugfix opnsense stops working after every update and it is difficult to patch the system by hand because every reconnect attempt rises the state reset feature to clean up the NAT tables. This was introduced to keep SIP communication working after ISP initiated IP address change.

Is there a chance to get the problem solved before release 19.7?

[Edit]
The topic is still relevant for upcoming release Opnsense 19.1, so further discussion will move to this forum (see here (https://forum.opnsense.org/index.php?topic=11373.0)). Please answer there.
Title: Re: [isolated: see #91] PPPoE reconnect loop
Post by: franco on January 30, 2019, 06:19:09 pm
Last ticket update on https://github.com/opnsense/core/issues/2267 on 26 Jun 2018. As stated elsewhere, please complain early *and* often.

My work queue that is based on 100% self-funding after my 40 hour day job entirely away from OPNsense is maxed out either way. I can only prioritise according to user feedback and progress.


Cheers,
Franco