PPPoE pain

Started by Alphakilo, March 28, 2018, 06:44:22 PM

Previous topic - Next topic
Hello folks.

I very recently switched from Sophos UTM to OPNsense (18.1.5). So far this journey was very painful. It makes me question whether I'm incompetent or something went horribly wrong with my deployment.

I have a slew of issues, pretty much everywhere. The thing that bothers me most currently is the stability of my one and only egress to WAN.

Let me describe my setup.
My ISP (1&1 / Versanet) provides me VDSL via PPPoE dial-in. They only accept PPPoE traffic which is tagged as VLAN 7.
So I created an interface for that (re1_vlan7) and used it for the PPP configuration (pppoe0). That resulted in the creation of the WAN interface.

re1 is connected to an Zyxel VDSL modem.

The dial-in works. Most of the time. Now to my issues:

Unable to save WAN interface

Every time I save the WAN interface, connectivity drops until I reboot OPNsense. I figured two different causes

a) My PPPoE password got URL-encoded every time I saved either the interface or PPP-config. I solved that setting a new password at my ISP.
b) NAT continues to use the old interface address as src address for new (!) connections, sporadically. In fact: it continues to use all addresses that where ever assigned to the interface during uptime in what seems to be an round-robin fashion.

24h disconnects

My ISP forces a re-dial-in every 24h, assigning a new addresses (IPv4 & v6). I used a cron job (via WebUI / System / Settings / Cron: 0 5 * * * Periodic interface reset -> pppoe0) to do that at a time where it's least bothersome. That results in broken PPPoE dial-in until I reboot OPNsense.

Broken in the sense that the dial-in works, but get's terminated after 2-4 seconds after obtaining an IP by LCP "LCP: rec'd Terminate Request #3" (Configure-NAK?).
After the reboot the dial-in works instantly and is not terminated in that fashion.

And when I fix this issue, I'm pretty sure the NAT issue from above will come back to haunt me.

Vanishing default gateways

The ISP assigned default-gateway drops out of the routing table from time to time. I haven't been able to figure out the root cause, because I don't even know where to look. The issue get's more frequent when I enable gateway-monitoring though.

DHCPv6

There are multiple DHCPv6 clients for the pppoe0 interface, resulting in "dhcp6c: XID mismatch".
I can't figure out how or why there are multiple instances for the same interface using the same configuration (and funny enough: the same PID file):

root@{{hostname}}:~ # ps x | grep "dhcp6c"
74604  -  Ss     0:00.05 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0
79106  -  Ss     0:00.04 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0


I'm at loss what even to do next.
It just feels so random. Every time I fix something, more problems come up.

March 28, 2018, 07:35:14 PM #1 Last Edit: March 28, 2018, 07:36:52 PM by elektroinside
Actually, your issues are very similar to this:
https://forum.opnsense.org/index.php?topic=7270.0

You might need a few minutes to read the entire conversation...

But, there's a complete rework going on here and there, so soon this will get fixed.
OPNsense v18 | HW: Gigabyte Z370N-WIFI, i3-8100, 8GB RAM, 60GB SSD, | Controllers: 82575GB-quad, 82574, I221, I219-V | PPPoE: RDS Romania | Down: 980Mbit/s | Up: 500Mbit/s

Team Rebellion Member

I think, your observations with PPP re-connects (manual or 24h triggered by your ISP) could be related to an already discussed issue. See below...


OPNsense 24.7.11_2-amd64

Thanks for the replies!
I set the MTU for the WAN interface, guilty as charged ;D I'll revert that to default and see if the problem persists.

It worked out for me (deleting the custom value completely, leaving the field empty), i'm really curious if it does for you as well. Please report back, PPPoE has been a headache for some of us :)
OPNsense v18 | HW: Gigabyte Z370N-WIFI, i3-8100, 8GB RAM, 60GB SSD, | Controllers: 82575GB-quad, 82574, I221, I219-V | PPPoE: RDS Romania | Down: 980Mbit/s | Up: 500Mbit/s

Team Rebellion Member

Quote from: elektroinside on March 28, 2018, 09:25:49 PM
It worked out for me (deleting the custom value completely, leaving the field empty), i'm really curious if it does for you as well. Please report back

Didn't work for me. Fields are now empty. I think my problem is pretty much the same that schnipp is experiencing.

Still, it doesn't account for my other troubles.

Quote from: Alphakilo on March 28, 2018, 06:44:22 PM

root@{{hostname}}:~ # ps x | grep "dhcp6c"
74604  -  Ss     0:00.05 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0
79106  -  Ss     0:00.04 /usr/local/sbin/dhcp6c -D -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_pppoe0.pid pppoe0


I'm at loss what even to do next.
It just feels so random. Every time I fix something, more problems come up.

This is caused by bouncing WAN connections or by bouncing the interface itself. I have a rather large PR on Github that fixes it, but that is being broken into smaller chunks to be tested bit by bit before it makes it into a release, sorry about that but them's the rules as they say. The pid will be the same name, as that's the name created in the start command, the value of the pid is what counts and that's in the file itself. When a new instance starts, it overwrites the PID file. I'll try and get the dhcp6c handling part of the PR pushed through over the next few days and that should see an end of this issue.

If you see two ( or more ) instances of dhcp6c use a kill -9 pid_value on all of the instances and restart the WAN interface.
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

Quote from: marjohn56 on March 28, 2018, 11:17:37 PMsorry about that but them's the rules as they say.
As a German, you had me at rules  ;D

Quote from: marjohn56 on March 28, 2018, 11:17:37 PM
The pid will be the same name, as that's the name created in the start command, the value of the pid is what counts and that's in the file itself.

It wouldn't make sense to have a pid file otherwise. I tried to point out that two daemon processes that are supposed to be atomic are running using the same pid file. Which is not supposed to happen. Unless...
# file /var/run/dhcp6c_pppoe0.pid
/var/run/dhcp6c_pppoe0.pid: cannot open `/var/run/dhcp6c_pppoe0.pid' (No such file or directory)


Well there's your problem!

Quote from: marjohn56 on March 28, 2018, 11:17:37 PM
When a new instance starts, it overwrites the PID file. I'll try and get the dhcp6c handling part of the PR pushed through over the next few days and that should see an end of this issue.

So in what case is no pid file written at all?

Quote from: marjohn56 on March 28, 2018, 11:17:37 PM
If you see two ( or more ) instances of dhcp6c use a kill -9 pid_value on all of the instances and restart the WAN interface.

I tried that. That spawns two processes.

There should only ever be ONE dhcp6c process.

However,, Franco pulled a couple of PR's this morning that should appear in 18.1_6, hopefully that should be the end of the problem.
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

Quote from: marjohn56 on March 29, 2018, 03:21:54 PM
However,, Franco pulled a couple of PR's this morning that should appear in 18.1_6, hopefully that should be the end of the problem.

I'm willing to test the patch, if that's possible.

There are a lot of changes around that area in 18.1_6 plus an update to dhcp6c.

I can send you a modified 18.1_5 interfaces.inc file and a dhcp6c exe but you'd need to manually drop them into place at the moment. If you want to do that then pm me and I'll sort something out for you.
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

March 29, 2018, 05:27:30 PM #11 Last Edit: March 29, 2018, 05:30:24 PM by pylox
Hi,

Quote
My ISP (1&1 / Versanet) provides me VDSL via PPPoE dial-in. They only accept PPPoE traffic which is tagged as VLAN 7.
So I created an interface for that (re1_vlan7) and used it for the PPP configuration (pppoe0). That resulted in the creation of the WAN interface.

- i think, here is a (one) possible problem. My provider is also 1&1 and have a Zyxel (Router configured as modem) - and everything works fine. You must not configure VLAN7/WAN on firewall-side... You have to setup in Zyxel a bridge with VLAN7. Can you try this ?

regards
pylox