OPNsense Forum

Archive => 18.1 Legacy Series => Topic started by: Alphakilo on March 28, 2018, 06:44:22 pm

Title: PPPoE pain
Post by: Alphakilo on March 28, 2018, 06:44:22 pm
Hello folks.

I very recently switched from Sophos UTM to OPNsense (18.1.5). So far this journey was very painful. It makes me question whether I'm incompetent or something went horribly wrong with my deployment.

I have a slew of issues, pretty much everywhere. The thing that bothers me most currently is the stability of my one and only egress to WAN.

Let me describe my setup.
My ISP (1&1 / Versanet) provides me VDSL via PPPoE dial-in. They only accept PPPoE traffic which is tagged as VLAN 7.
So I created an interface for that (re1_vlan7) and used it for the PPP configuration (pppoe0). That resulted in the creation of the WAN interface.

re1 is connected to an Zyxel VDSL modem.

The dial-in works. Most of the time. Now to my issues:

Unable to save WAN interface

Every time I save the WAN interface, connectivity drops until I reboot OPNsense. I figured two different causes

a) My PPPoE password got URL-encoded every time I saved either the interface or PPP-config. I solved that setting a new password at my ISP.
b) NAT continues to use the old interface address as src address for new (!) connections, sporadically. In fact: it continues to use all addresses that where ever assigned to the interface during uptime in what seems to be an round-robin fashion.

24h disconnects

My ISP forces a re-dial-in every 24h, assigning a new addresses (IPv4 & v6). I used a cron job (via WebUI / System / Settings / Cron: 0 5 * * * Periodic interface reset -> pppoe0) to do that at a time where it's least bothersome. That results in broken PPPoE dial-in until I reboot OPNsense.

Broken in the sense that the dial-in works, but get's terminated after 2-4 seconds after obtaining an IP by LCP "LCP: rec'd Terminate Request #3" (Configure-NAK?).
After the reboot the dial-in works instantly and is not terminated in that fashion.

And when I fix this issue, I'm pretty sure the NAT issue from above will come back to haunt me.

Vanishing default gateways

The ISP assigned default-gateway drops out of the routing table from time to time. I haven't been able to figure out the root cause, because I don't even know where to look. The issue get's more frequent when I enable gateway-monitoring though.

DHCPv6

There are multiple DHCPv6 clients for the pppoe0 interface, resulting in "dhcp6c: XID mismatch".
I can't figure out how or why there are multiple instances for the same interface using the same configuration (and funny enough: the same PID file):

Code: [Select]
root@{{hostname}}:~ # ps x | grep "dhcp6c"
74604  -  Ss     0:00.05 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0
79106  -  Ss     0:00.04 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0

I'm at loss what even to do next.
It just feels so random. Every time I fix something, more problems come up.
Title: Re: PPPoE pain
Post by: elektroinside on March 28, 2018, 07:35:14 pm
Actually, your issues are very similar to this:
https://forum.opnsense.org/index.php?topic=7270.0

You might need a few minutes to read the entire conversation...

But, there's a complete rework going on here and there, so soon this will get fixed.
Title: Re: PPPoE pain
Post by: schnipp on March 28, 2018, 07:36:16 pm
I think, your observations with PPP re-connects (manual or 24h triggered by your ISP) could be related to an already discussed issue. See below…


Title: Re: PPPoE pain
Post by: Alphakilo on March 28, 2018, 08:49:21 pm
Thanks for the replies!
I set the MTU for the WAN interface, guilty as charged ;D I'll revert that to default and see if the problem persists.
Title: Re: PPPoE pain
Post by: elektroinside on March 28, 2018, 09:25:49 pm
It worked out for me (deleting the custom value completely, leaving the field empty), i'm really curious if it does for you as well. Please report back, PPPoE has been a headache for some of us :)
Title: Re: PPPoE pain
Post by: Alphakilo on March 28, 2018, 10:33:06 pm
It worked out for me (deleting the custom value completely, leaving the field empty), i'm really curious if it does for you as well. Please report back

Didn't work for me. Fields are now empty. I think my problem is pretty much the same that schnipp is experiencing.

Still, it doesn't account for my other troubles.
Title: Re: PPPoE pain
Post by: marjohn56 on March 28, 2018, 11:17:37 pm

Code: [Select]
root@{{hostname}}:~ # ps x | grep "dhcp6c"
74604  -  Ss     0:00.05 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0
79106  -  Ss     0:00.04 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0

I'm at loss what even to do next.
It just feels so random. Every time I fix something, more problems come up.

This is caused by bouncing WAN connections or by bouncing the interface itself. I have a rather large PR on Github that fixes it, but that is being broken into smaller chunks to be tested bit by bit before it makes it into a release, sorry about that but them's the rules as they say. The pid will be the same name, as that's the name created in the start command, the value of the pid is what counts and that's in the file itself. When a new instance starts, it overwrites the PID file. I'll try and get the dhcp6c handling part of the PR pushed through over the next few days and that should see an end of this issue.

If you see two ( or more ) instances of dhcp6c use a kill -9 pid_value on all of the instances and restart the WAN interface.
Title: Re: PPPoE pain
Post by: Alphakilo on March 29, 2018, 01:44:42 pm
sorry about that but them's the rules as they say.
As a German, you had me at rules  ;D

The pid will be the same name, as that's the name created in the start command, the value of the pid is what counts and that's in the file itself.

It wouldn't make sense to have a pid file otherwise. I tried to point out that two daemon processes that are supposed to be atomic are running using the same pid file. Which is not supposed to happen. Unless...
Code: [Select]
# file /var/run/dhcp6c_pppoe0.pid
/var/run/dhcp6c_pppoe0.pid: cannot open `/var/run/dhcp6c_pppoe0.pid' (No such file or directory)

Well there's your problem!

When a new instance starts, it overwrites the PID file. I'll try and get the dhcp6c handling part of the PR pushed through over the next few days and that should see an end of this issue.

So in what case is no pid file written at all?

If you see two ( or more ) instances of dhcp6c use a kill -9 pid_value on all of the instances and restart the WAN interface.

I tried that. That spawns two processes.
Title: Re: PPPoE pain
Post by: marjohn56 on March 29, 2018, 03:21:54 pm
There should only ever be ONE dhcp6c process.

However,, Franco pulled a couple of PR's this morning that should appear in 18.1_6, hopefully that should be the end of the problem.
Title: Re: PPPoE pain
Post by: Alphakilo on March 29, 2018, 04:07:46 pm
However,, Franco pulled a couple of PR's this morning that should appear in 18.1_6, hopefully that should be the end of the problem.

I'm willing to test the patch, if that's possible.
Title: Re: PPPoE pain
Post by: marjohn56 on March 29, 2018, 04:24:21 pm
There are a lot of changes around that area in 18.1_6 plus an update to dhcp6c.

I can send you a modified 18.1_5 interfaces.inc file and a dhcp6c exe but you'd need to manually drop them into place at the moment. If you want to do that then pm me and I'll sort something out for you.
Title: Re: PPPoE pain
Post by: pylox on March 29, 2018, 05:27:30 pm
Hi,

Quote
My ISP (1&1 / Versanet) provides me VDSL via PPPoE dial-in. They only accept PPPoE traffic which is tagged as VLAN 7.
So I created an interface for that (re1_vlan7) and used it for the PPP configuration (pppoe0). That resulted in the creation of the WAN interface.

- i think, here is a (one) possible problem. My provider is also 1&1 and have a Zyxel (Router configured as modem) - and everything works fine. You must not configure VLAN7/WAN on firewall-side... You have to setup in Zyxel a bridge with VLAN7. Can you try this ?

regards
pylox