Sporadic PPPoE disconnects

Started by 0rangeCookie, August 29, 2025, 11:50:03 AM

Previous topic - Next topic
August 29, 2025, 11:50:03 AM Last Edit: August 29, 2025, 11:51:46 AM by 0rangeCookie
Hello everyone,

I am experiencing recurring PPPoE WAN disconnects on my OPNsense firewall and would appreciate some advice.

Setup:

  • ISP: NetCologne (Germany)
  • Connection: VDSL2, requires VLAN 7 for PPPoE
  • Booked option: static public IP address
  • Speed: 100/40 Mbps DSL profile
  • Modem: DrayTek Vigor 167 in full bridge mode (reachable in a separate subnet, shows stable DSL sync with upstream/downstream even when the issue occurs)
  • Firewall: OPNsense running on an Intel i5 appliance (igc0 = WAN, igc1 = LAN)
  • WAN configured with PPPoE
  • MTU: 1492
  • MRU: (currently not set to 1492, probably defaulting to 1500)
  • MSS clamping: not sure if enabled
  • Idle timeout: 0 (disabled)
  • LCP Echo enabled (10s interval, 5 failures)
  • Idle timeout: 0 (disabled)
  • LCP Echo enabled (10s interval, 5 failures)

Problem:
From time to time my PPPoE WAN session suddenly drops. This does not happen on a fixed schedule, sometimes it runs stable for weeks, sometimes I see it multiple times a day, sometimes once every 2–3 days.

When the issue occurs:

The modem shows stable DSL sync (line up, up/downstream active).
OPNsense loses the PPPoE session.
Reconnect attempts fail with "PPPoE connection timeout after 9 seconds".
It looks like OPNsense sends PADI packets but never receives a PADO back.
The modem is still reachable in its management subnet.
Restarting OPNsense always fixes the issue immediately.
I have not yet tried restarting only the modem when this happens.

Important background:

I have been struggling with this problem for a long time already.
On pfSense 2.7.2 the exact same behavior occurred regularly.
Because of this, I migrated to OPNsense and surprisingly, after the switch, the connection ran over 60-80 days without a single dropout.
I thought the migration had solved it but unfortunately, the problem has now reappeared during the last 4-6 weeks.

Relevant log excerpt:

Aug 29 10:56:45 ppp: [wan_link0] LCP: no reply to 1 echo request
Aug 29 10:57:25 ppp: [wan_link0] LCP: no reply to 5 echo requests
Aug 29 10:57:25 ppp: [wan_link0] LCP: peer not responding to echo requests
Aug 29 10:57:25 ppp: [wan_link0] Link: Down event
Aug 29 10:57:25 ppp: [wan_link0] PPPoE connection lost
Aug 29 10:57:25 ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Aug 29 10:57:34 ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Aug 29 10:57:43 ppp: [wan_link0] PPPoE connection timeout after 9 seconds


My suspicion:

Possibly a "ghost session" or stuck PPPoE state on the ISP side, especially since I have a static IP.
OPNsense (and previously pfSense) seems unable to properly clear the old session, while a full system reboot always works.
Wrong MRU (not set to 1492) and/or missing MSS clamping might also play a role.

Questions:

Has anyone else seen similar sporadic PPPoE behavior (no PADO after disconnect) with OPNsense or pfSense?
Is it important to explicitly set MRU = 1492 on the WAN interface in addition to MTU?
Should MSS clamping always be enabled on PPPoE WAN connections like this?
Is there a recommended watchdog/Monit/Cron setup to automatically reload PPPoE if no PADO is received?
Could this be related to how OPNsense handles PPPoE re-connects when a static IP is assigned?

Any input would be greatly appreciated. Thanks a lot in advance!

Best regards

i would start by looking at WAN L1 and L2 issues, i run PPPoE on several OPNsense deployments and it's been rock solid. Using untagged ethernet on the WAN side, is only difference in my deployments.


If your able to capture a tcpdump on igc0 interface when the issue occurs would help confirm that the PPPoE PADI discovery packets should be hitting the wire.

Is there a L2 domain under you control connecting to igc0 ? It is sanitized and rock solid ?

When the issue occurs, have tried first re-starting the DrayTek modem instead ?
OPNsense 25.7.1_1-amd64 running on ESXi 6.7 U2 VM, 4Gbytes RAM, 2 x vCPU
frr OSPF + eBGP, IDS, AdGuard Home, sftp-backup plugins. limited kea DHCP server deployment.

Thanks for the hints!
So far the issue luckily hasn't reappeared, but I expect it will come back sooner or later. Probably sooner. Once it does, I'll first try restarting the DrayTek modem to see if OPNsense is then able to bring the PPPoE session up again.

From my side I can currently rule out a pure L1 problem, since I always have an uplink to the modem, even when PPPoE drops. Do you generally run without VLAN tagging on the WAN, or do you offload the VLAN to the modem? In my setup, there's nothing between the modem and the OPNsense: WAN port goes directly to the modem, LAN to my Unifi 48PoE switch.

Once the issue occurs again, I'll capture a tcpdump on igc0 to check if the PPPoE PADI packets are actually hitting the wire.

September 02, 2025, 05:19:18 AM #3 Last Edit: September 02, 2025, 05:21:21 AM by hharry
Quote from: 0rangeCookie on September 01, 2025, 05:26:50 PMFrom my side I can currently rule out a pure L1 problem, since I always have an uplink to the modem, even when PPPoE drops. Do you generally run without VLAN tagging on the WAN, or do you offload the VLAN to the modem? In my setup, there's nothing between the modem and the OPNsense: WAN port goes directly to the modem, LAN to my Unifi 48PoE switch.

That modem is performing a L2 bridge / switch function..

In my deployments, my upstream WAN provider presents ethernet L2 as untagged, and in their L2 domain, is where my traffic is then encapsulated and switched in it's own unique S-TAG'ed VLAN

Other folks running vlan interfaces on OPNsense have in the past reported some connectivity issues, i suggest you use the forum search function to find them...

Can your on-premises modem present your WAN ethernet as untagged ?
OPNsense 25.7.1_1-amd64 running on ESXi 6.7 U2 VM, 4Gbytes RAM, 2 x vCPU
frr OSPF + eBGP, IDS, AdGuard Home, sftp-backup plugins. limited kea DHCP server deployment.