[isolated: see #91] PPPoE reconnect loop

Started by schnipp, February 11, 2018, 02:46:04 PM

Previous topic - Next topic
Applied the patch and changed IPv6 address back to none.

No reconnect loop yet, but it didn't always happen before so can't confirm that it is fixed.

No rush, if you can let us know in a week or two. :)


Thanks,
Franco

Within this thread users address different issues with PPPoE dial-up. Some have problems with IPv6 address assignment, others with keep-alive signaling to the ISP.

Unfortunately, with update from 18.1.2 to 18.1.4 my problem (keep-alive signaling to the ISP by use of PPP LCP echo request/reply) is not solved. Instead the problem got more worse because in case of re-establishing the PPPoE session the mpd5 daemon does not send any PPP LCP packets to the ISP anymore. So from now on, also PPP configuration requests are not answered.  :'(

My guess is, there could be some bugs in the mpd5 daemon. The source code is not high quality (not well documented, a lot of immediates deeply integrated in the source code in conjunction with a lot of pointer arithmetics) :( So, it is hard to debug on a productive system without gdb.

Does anybody know, what has changed in the daemons source code related to the Opnsense update mentioned above?

OPNsense 24.7.11_2-amd64

Franco said there will be several PPPoE improvements in the upcoming service releases. I'm confident most issues will get fixed. I have sent Franco some logs regarding these new changes via email, I guess if it's possible you can also jump in. I'm sure he's doing his best to get these done as soon as possible...
OPNsense v18 | HW: Gigabyte Z370N-WIFI, i3-8100, 8GB RAM, 60GB SSD, | Controllers: 82575GB-quad, 82574, I221, I219-V | PPPoE: RDS Romania | Down: 980Mbit/s | Up: 500Mbit/s

Team Rebellion Member

Quote from: elektroinside on March 11, 2018, 05:08:10 PM
[...]
I guess if it's possible you can also jump in. I'm sure he's doing his best to get these done as soon as possible...

That sounds good  :). I am still investigating with my self-compiled debug version of mpd5. I have to check whether the ppp netgraph node (see mpd32168-wan in reply #67) successfully forwards packets from hook bypass to link0.

OPNsense 24.7.11_2-amd64

March 12, 2018, 07:07:07 AM #65 Last Edit: March 12, 2018, 07:16:34 AM by elektroinside
Has this been reported on github?
It is getting bigger (the thread) and I think it's time to open a ticket there... it will be easier for the devs to follow the progress. Also, not all of them are reading the forum.
I think it's a better idea than emailing logs. We can all contribute there...
OPNsense v18 | HW: Gigabyte Z370N-WIFI, i3-8100, 8GB RAM, 60GB SSD, | Controllers: 82575GB-quad, 82574, I221, I219-V | PPPoE: RDS Romania | Down: 980Mbit/s | Up: 500Mbit/s

Team Rebellion Member

Quote from: schnipp on March 11, 2018, 04:14:00 PM
Within this thread users address different issues with PPPoE dial-up. Some have problems with IPv6 address assignment, others with keep-alive signaling to the ISP.

Unfortunately, with update from 18.1.2 to 18.1.4 my problem (keep-alive signaling to the ISP by use of PPP LCP echo request/reply) is not solved. Instead the problem got more worse because in case of re-establishing the PPPoE session the mpd5 daemon does not send any PPP LCP packets to the ISP anymore. So from now on, also PPP configuration requests are not answered.  :'(

My guess is, there could be some bugs in the mpd5 daemon. The source code is not high quality (not well documented, a lot of immediates deeply integrated in the source code in conjunction with a lot of pointer arithmetics) :( So, it is hard to debug on a productive system without gdb.

Does anybody know, what has changed in the daemons source code related to the Opnsense update mentioned above?

You can revert to old version to check if it really was the change by franco:

opnsense-revert -r 18.1.2 mpd5


Oh, I thought you said last time you recompiled it. Ok, then it's FreeBSD but the revert should still do the trick :)

March 14, 2018, 07:59:06 PM #69 Last Edit: March 14, 2018, 08:00:42 PM by schnipp
Quote from: elektroinside on March 12, 2018, 07:07:07 AM
Has this been reported on github?
It is getting bigger (the thread) and I think it's time to open a ticket there... it will be easier for the devs to follow the progress. Also, not all of them are reading the forum.

No, it hasn't been yet reported. The last ticket I opened some time ago has not been paid any attention (see issue #1961). Thus, regarding this topic, I opened the discussion in the forum. But, we can try to move it to the issue tracker.

OPNsense 24.7.11_2-amd64

Please do. I am almost sure it was taken into consideration. At least the devs know PPPoE has issues. As I remember correctly, Franco was actively talking about these on the FreeBSD bugtracker. He also told me fixes/improvements are scheduled  :-)
OPNsense v18 | HW: Gigabyte Z370N-WIFI, i3-8100, 8GB RAM, 60GB SSD, | Controllers: 82575GB-quad, 82574, I221, I219-V | PPPoE: RDS Romania | Down: 980Mbit/s | Up: 500Mbit/s

Team Rebellion Member

I have created a new bug report (#2267) https://github.com/opnsense/core/issues/2267
OPNsense 24.7.11_2-amd64

Quote from: schnipp on March 18, 2018, 02:17:41 PM
I have created a new bug report (#2267) https://github.com/opnsense/core/issues/2267

Thanks. Let's see what progress we'll have.
OPNsense v18 | HW: Gigabyte Z370N-WIFI, i3-8100, 8GB RAM, 60GB SSD, | Controllers: 82575GB-quad, 82574, I221, I219-V | PPPoE: RDS Romania | Down: 980Mbit/s | Up: 500Mbit/s

Team Rebellion Member

btw .. this weekend I got my new fiber dsl and I had a similar problem with a Cisco Router as client.
The reason was the Cisco was not able to tag VLAN7 on a SVI port (just unsupported), but when firing up debugging I read about ongoing LCP timeouts and thought of you.

@schnipp: Do you use VLANs for PPPoE? I can't imagine this is a general problem but perhaps something that with a reconnect mpd can't push into VLAN

Another idea ... go to cli and edit:

/usr/local/opnsense/service/templates/OPNsense/IPFW/ipfw.conf

Search for rule 150 and add log option:

add 150 deny log layer2 not mac-type ip,ipv6

Then restart configd:

service configd restart

Via UI go to Traffic Shaper and hit "Reset".

Then back to CLI and type:

sysctl net.inet.ip.fw.verbose_limit=5



Perhaps you see something blocked in system.log or filter.log. Just a guess since it hit upon this rules which blocks everything and *could* be related to non ipv4/6