OPNsense Forum

Archive => 16.1 Legacy Series => Topic started by: Alphabet Soup on April 21, 2016, 03:59:55 pm

Title: PPPoE down and cannot reconnect issue
Post by: Alphabet Soup on April 21, 2016, 03:59:55 pm
My opnsense 16.1 router has exhibited two problems in recent weeks, neither of which appeared during several months of use with 15.7.  Both seem related to the fact that my internet links are via PPPoE.

The connection down problem is pretty severe and my only solution so far is to reboot:

One of my PPPoE connections will go down.  OPNsense will try to reconnect but timeout after 9 seconds and retry, over and over and over, thousands of times, probably forever if I don't intervene.  My first thought was that something upstream had failed, but this has happened several times now and I've had some opportunities to fiddle around during breakage.

If I move the "connection down" cable from the OPNsense to a laptop and configure the same PPPoE connection it comes right up.  I can also move that cable to the router I used prior to deploying OPNsense and it will also bring the same PPPoE connection up.  But even moving the cable back to the OPNsense at this point, it will only retry retry retry.  Rebooting the OPNsense clears it's head however, and it can then successfully bring up the PPPoE connection as if nothing was ever wrong.

The symptoms sound the same as https://forum.opnsense.org/index.php?topic=2337.0 (https://forum.opnsense.org/index.php?topic=2337.0) but that thread seems to show that the problem was resolved with 16.1.5.  I started the 16.1 series from 16.1.7.  There's some mention of RFC 4638 support having been implemented but my PPPoE connection MTU config in OPNsense is the ISP-recommended 1454, and should have nothing whatsoever to do with any RFC 4638 code.  No VLANs are used, the OPNsense is cabled directly to the ISP equipment.

Any help appreciated.  What other info could I provide that would help you help me?
Title: Re: PPPoE down and cannot reconnect issue
Post by: franco on April 21, 2016, 10:43:09 pm
Something is not fully reloading so it remains stuck. Do you have more logs from the System and PPP devices? We'll need this in order pinpoint the issue.
Title: Re: PPPoE down and cannot reconnect issue
Post by: Alphabet Soup on April 22, 2016, 03:44:48 pm
It happened again, so I copied off everything from /var/log before rebooting.  Looking over the files, I really don't see anything in the time leading up to the disconnect except the ppps.log.  I've only scrubbed out the IP address from the "Delete route" logline.

Apr 22 22:00:10 OPNsense ppp: [opt2_link0] LCP: no reply to 1 echo request(s)
Apr 22 22:00:20 OPNsense ppp: [opt2_link0] LCP: no reply to 2 echo request(s)
Apr 22 22:00:30 OPNsense ppp: [opt2_link0] LCP: no reply to 3 echo request(s)
Apr 22 22:00:40 OPNsense ppp: [opt2_link0] LCP: no reply to 4 echo request(s)
Apr 22 22:00:50 OPNsense ppp: [opt2_link0] LCP: no reply to 5 echo request(s)
Apr 22 22:00:50 OPNsense ppp: [opt2_link0] LCP: peer not responding to echo requests
Apr 22 22:00:50 OPNsense ppp: [opt2_link0] LCP: state change Opened --> Stopping
Apr 22 22:00:50 OPNsense ppp: [opt2_link0] Link: Leave bundle "opt2"
Apr 22 22:00:50 OPNsense ppp: [opt2] Bundle: Status update: up 0 links, total bandwidth 9600 bps
Apr 22 22:00:50 OPNsense ppp: [opt2] IPCP: Close event
Apr 22 22:00:50 OPNsense ppp: [opt2] IPCP: state change Opened --> Closing
Apr 22 22:00:50 OPNsense ppp: [opt2] IPCP: SendTerminateReq #4
Apr 22 22:00:50 OPNsense ppp: [opt2] IPCP: LayerDown
Apr 22 22:00:50 OPNsense ppp: [opt2] IFACE: Delete route 0.0.0.0/0 XXX.XXX.XXX.XXX failed: No such process
Apr 22 22:00:50 OPNsense ppp: [opt2] IFACE: Down event
Apr 22 22:00:50 OPNsense ppp: [opt2] IFACE: Rename interface pppoe1 to pppoe1
Apr 22 22:00:50 OPNsense ppp: [opt2] IPV6CP: Close event
Apr 22 22:00:50 OPNsense ppp: [opt2] IPV6CP: state change Stopped --> Closed
Apr 22 22:00:50 OPNsense ppp: [opt2] IPCP: Down event
Apr 22 22:00:50 OPNsense ppp: [opt2] IPCP: LayerFinish
Apr 22 22:00:50 OPNsense ppp: [opt2] Bundle: No NCPs left. Closing links...
Apr 22 22:00:50 OPNsense ppp: [opt2] IPCP: state change Closing --> Initial
Apr 22 22:00:50 OPNsense ppp: [opt2] IPV6CP: Down event
Apr 22 22:00:50 OPNsense ppp: [opt2] IPV6CP: state change Closed --> Initial
Apr 22 22:00:50 OPNsense ppp: [opt2_link0] LCP: SendTerminateReq #2
Apr 22 22:00:50 OPNsense ppp: [opt2_link0] LCP: LayerDown
Apr 22 22:00:52 OPNsense ppp: [opt2_link0] LCP: SendTerminateReq #3
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] LCP: state change Stopping --> Stopped
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] LCP: LayerFinish
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] PPPoE: connection closed
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] Link: DOWN event
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] LCP: Down event
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] LCP: state change Stopped --> Starting
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] LCP: LayerStart
Apr 22 22:00:54 OPNsense ppp: [opt2_link0] Link: reconnection attempt 1 in 1 seconds
Apr 22 22:00:55 OPNsense ppp: [opt2_link0] Link: reconnection attempt 1
Apr 22 22:00:55 OPNsense ppp: [opt2_link0] PPPoE: Connecting to '1'
Apr 22 22:01:04 OPNsense ppp: [opt2_link0] PPPoE connection timeout after 9 seconds
Apr 22 22:01:04 OPNsense ppp: [opt2_link0] Link: DOWN event
Apr 22 22:01:04 OPNsense ppp: [opt2_link0] LCP: Down event
Apr 22 22:01:04 OPNsense ppp: [opt2_link0] Link: reconnection attempt 2 in 4 seconds
Apr 22 22:01:08 OPNsense ppp: [opt2_link0] Link: reconnection attempt 2
Apr 22 22:01:08 OPNsense ppp: [opt2_link0] PPPoE: Connecting to '1'
Apr 22 22:01:17 OPNsense ppp: [opt2_link0] PPPoE connection timeout after 9 seconds
Apr 22 22:01:17 OPNsense ppp: [opt2_link0] Link: DOWN event
Apr 22 22:01:17 OPNsense ppp: [opt2_link0] LCP: Down event
Apr 22 22:01:17 OPNsense ppp: [opt2_link0] Link: reconnection attempt 3 in 3 seconds


...repeat repeat repeat.  As before, rebooting OPNsense brought the connection up again.
Title: Re: PPPoE down and cannot reconnect issue
Post by: bartjsmit on April 22, 2016, 03:57:29 pm
I've been able to revive a stuck PPPoE by disabling and enabling the WAN interface. Not a fix - but quicker than a reboot.

Bart...
Title: Re: PPPoE down and cannot reconnect issue
Post by: Alphabet Soup on April 22, 2016, 04:18:26 pm
I can't remember if I've done that or not.  I know Disconnect / Connect the PPPoE connection didn't work.  I'll give your interface Disable / Enable tip a try next time.  Thanks.
Title: Re: PPPoE down and cannot reconnect issue
Post by: Alphabet Soup on May 03, 2016, 08:22:19 am
A follow-up on this issue.  I reverted to another OPNsense config where the WAN links are just static addresses instead of dynamic PPPoE connections.  The PPPoE (and NAT) is then handled by border routers cabled directly to the OPNsense.  This worked well for months with 15.7, so I hoped it would clear up the issue.

Unfortunately one of the OPNsense WAN links went down again.  The border router attached to the link was fine, still PPPoE connected and functioning properly.  But the OPNsense could not ping or connect to the router.  I tried swapping that WAN cable to another border router, still OPNsense couldn't ping/connect to it.

system.log had a few hundred lines over a few seconds of:
May  3 09:12:23 OPNsense kernel: arpresolve: can't allocate llinfo for XXX.XXX.XXX.XXX on em2
The XXX's being the static IP of that particular OPNsense WAN interface.

Tried bartjsmit's suggestion of Disable / Enable the interfaces.  It did not help, but the logs show OPNsense running through another few hundred 'arpresolve' system.log lines again after I Enabled the interface.  I also tried manually 'ifconfig em2 down' and 'ifconfig em2 up', no help either.

So, I don't think this issue rests on PPPoE.  This box has no PPPoE config anywhere in it.  But the general symptom of an interface suddenly going deaf/mute, where no prodding will get it going again until a reboot, that seems similar.

I have two OPNsense boxes and have experienced issues on both now with 16.1 that I never had with 15.7.  I can't be sure it's not a hardware fault, but is there an easy way to downgrade back to 15.7 to see if the problems follow me?  Or, preferably, any better troubleshooting suggestions?
Title: Re: PPPoE down and cannot reconnect issue
Post by: franco on June 21, 2016, 08:38:13 pm
There's a ticket here, it needs help and a little more info from the original reporter... I know this is annoying but as long as we can't get to the bottom we can't fix... I'm not asking for version info and help just for fun. :)

https://github.com/opnsense/core/issues/850
Title: Re: PPPoE down and cannot reconnect issue
Post by: Alphabet Soup on July 04, 2016, 04:35:36 am
The issue has not recurred with the static address version of my config since my update of May 3, so maybe something else was the problem that day.

I'm now able to put the dynamic PPPoE version of my config back into production for a while to see if the issue still persists for that setup.  The box is running the latest 16.1.18.  If it does happen again, I'll report here.

Thanks for checking back on this!