ipv6 connectivity getting dropped after few hours

Started by superczar, May 10, 2023, 09:11:11 AM

Previous topic - Next topic
Running OPNsense 23.1.7_3-amd64 on a multi WAN failover setup.

The primary connection is a gigabit fiber over PPPoE (ISP - Airtel )
With the ISP provided equipment in routing mode, both ipv4 and ipv6 are normally 100% stable with no reboots etc required for months.

With the ISP equipment set to bridging mode, opnsense obtains ipv4 and ipv6 as expected on reboot.
All clients get an ipv6 address as expected.

However after anywhere between an hour to several hours, the ipv6 connectivity on WAN drops and requires a pppoe restart.

Looking at the Logs, the one place i could find something of relevance was under System>Logs>General
dhclient seems to be generating at new resolv.conf every hour.
A few hours after last restart - I see the following in the log

2023-05-10T10:41:11
Notice
dhclient
Creating resolv.conf

2023-05-10T09:41:11
Notice
dhclient
Creating resolv.conf

2023-05-10T08:41:11
Notice
dhclient
Creating resolv.conf

2023-05-10T07:41:11
Notice
dhclient
Creating resolv.conf

2023-05-10T06:41:11
Notice
dhclient
Creating resolv.conf

2023-05-10T05:41:11
Notice
dhclient
Creating resolv.conf

2023-05-10T05:33:47
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet6 default gateway 'fe80::c242:d0ff:fe94:814e%pppoe0'

2023-05-10T05:33:47
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet6 default gateway on opt2

2023-05-10T05:33:47
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet default gateway '122.169.63.255'

2023-05-10T05:33:47
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet default gateway on opt2

2023-05-10T05:33:47
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: entering configure using defaults

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet6 default gateway 'fe80::c242:d0ff:fe94:814e%pppoe0'

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet6 default gateway on opt2

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet default gateway '122.169.63.255'

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet default gateway on opt2

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: entering configure using defaults

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet6 default gateway 'fe80::c242:d0ff:fe94:814e%pppoe0'

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet6 default gateway on opt2

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet default gateway '122.169.63.255'

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet default gateway on opt2

2023-05-10T05:31:41
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: entering configure using defaults

2023-05-10T05:31:37
Notice
opnsense
/usr/local/etc/rc.newwanipv6: Failed to detect IP for Airtel[opt2]

2023-05-10T05:31:37
Notice
dhcp6c
dhcp6c RELEASE on pppoe0 - running newipv6

[b][color=red]2023-05-10T05:31:36
Notice
opnsense
/usr/local/etc/rc.newwanipv6: Failed to detect IP for Airtel[opt2]

2023-05-10T05:31:36
Notice
dhcp6c
dhcp6c REQUEST on pppoe0 - running newipv6

2023-05-10T05:31:34
Notice
opnsense
/usr/local/etc/rc.newwanipv6: Failed to detect IP for Airtel[opt2]

2023-05-10T05:31:34
Notice
dhcp6c
dhcp6c RELEASE on pppoe0 - running newipv6[/color]

[color=red]2023-05-10T05:31:34
Notice
dhcp6c
RTSOLD script - Sending SIGHUP to dhcp6c[/color]

2023-05-10T04:41:11
Notice
dhclient
Creating resolv.conf
[/b]
2023-05-10T04:37:56
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet6 default gateway 'fe80::c242:d0ff:fe94:814e%pppoe0'

2023-05-10T04:37:56
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet6 default gateway on opt2

2023-05-10T04:37:56
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet default gateway '122.169.x.255'

2023-05-10T04:37:56
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet default gateway on opt2

2023-05-10T04:37:56
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: entering configure using defaults

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet6 default gateway 'fe80::c242:d0ff:fe94:814e%pppoe0'

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet6 default gateway on opt2

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet default gateway '122.169.x.255'

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet default gateway on opt2

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: entering configure using defaults

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet6 default gateway 'fe80::c242:d0ff:fe94:814e%pppoe0'

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet6 default gateway on opt2

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: keeping current inet default gateway '122.169.x.255'

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: configuring inet default gateway on opt2

2023-05-10T04:36:49
Notice
opnsense
/usr/local/etc/rc.routing_configure: ROUTING: entering configure using defaults

2023-05-10T03:41:11
Notice
dhclient
Creating resolv.conf

2023-05-10T03:25:27
Notice
opnsense
/system_gateways.php: Chose to bind AIRTEL_PPPOE on 122.169.x.x since we could not find a proper match.

2023-05-10T03:25:27
Notice
opnsense
/system_gateways.php: plugins_configure monitor (execute task : dpinger_configure_do())

2023-05-10T03:25:27
Notice
opnsense
/system_gateways.php: plugins_configure monitor ()

2023-05-10T03:25:27
Notice
opnsense
/system_gateways.php: ROUTING: keeping current inet6 default gateway 'fe80::c242:d0ff:fe94:814e%pppoe0'



In particular, this part is where the failure seems to occur

2023-05-10T05:31:37
Notice
opnsense
/usr/local/etc/rc.newwanipv6: Failed to detect IP for Airtel[opt2]

2023-05-10T05:31:37
Notice
dhcp6c
dhcp6c RELEASE on pppoe0 - running newipv6

2023-05-10T05:31:36
Notice
opnsense
/usr/local/etc/rc.newwanipv6: Failed to detect IP for Airtel[opt2]

2023-05-10T05:31:36
Notice
dhcp6c
dhcp6c REQUEST on pppoe0 - running newipv6

2023-05-10T05:31:34
Notice
opnsense
/usr/local/etc/rc.newwanipv6: Failed to detect IP for Airtel[opt2]

2023-05-10T05:31:34
Notice
dhcp6c
dhcp6c RELEASE on pppoe0 - running newipv6

2023-05-10T05:31:34
Notice
dhcp6c
RTSOLD script - Sending SIGHUP to dhcp6c

2023-05-10T04:41:11
Notice
dhclient
Creating resolv.conf


Any pointers what settings should I be looking at ?
I would have normally attributed this to the ISP but if I use the ISP router, ipv6 remains rock solid

I'd try to set "Prevent release" on Interfaces: Settings page to see if that at least keeps connectivity.

But it looks like a daily disconnect?

"dhcp6c RELEASE on pppoe0" only happens once as far as I can see.

I also assume you have "Use IPv4 connectivity" set for IPv6 on the WAN? And IPv6 mode is set to DHCPv6?


Cheers,
Franco

As an additional thought I wonder if the recreation of the pppoe0 device causes dhcp6c to fail on SIGHUP in which case if we would switch to a full restart it would work. But then it would be more on the dhcp6c part giving up trying when it should do the job anyway.


Cheers,
Franco

Quote from: franco on May 10, 2023, 09:42:16 AM
I'd try to set "Prevent release" on Interfaces: Settings page to see if that at least keeps connectivity.

But it looks like a daily disconnect?

"dhcp6c RELEASE on pppoe0" only happens once as far as I can see.

I also assume you have "Use IPv4 connectivity" set for IPv6 on the WAN? And IPv6 mode is set to DHCPv6?


Cheers,
Franco

Thanks - let me try enabling prevent release and see if that helps.
Its not exactly a daily disconnect. I have had instances where it has dropped within an hour or two on multiple occasions on the same day .
And sometimes it sticks for several hours.
What I pasted was a single snippet for ease of reading.

And yes, ipv6 on the WAN is via IPv4 connection and IPv6 is set to DHCPv6

May 10, 2023, 12:23:26 PM #4 Last Edit: May 10, 2023, 12:30:17 PM by franco
There might be changes coming to dhcp6c soon. We can try one more thing if you are able to reproduce this at will. Would appreciate the help here!

This might fix it too.. https://github.com/opnsense/dhcp6c/commit/db9f45927


Cheers,
Franco

Quote from: franco on May 10, 2023, 12:23:26 PM
There might be changes coming to dhcp6c soon. We can try one more thing if you are able to reproduce this at will. Would appreciate the help here!

This might fix it too.. https://github.com/opnsense/dhcp6c/commit/db9f45927


Cheers,
Franco

Thanks for looking into this @franco
Don't think I am able to reproduce this at will as so far , the drops and subsequent non recovery have been seemingly random.

I have enabled the "Prevent Release" flag and its been just over an hour with no drops.
The ppp log also seems to be blissfully static.
Will monitor it overnight and report back if there are any drops or issues!
Happy to help whichever way I can.
Let me monitor the connection

May 10, 2023, 10:10:07 PM #6 Last Edit: May 10, 2023, 10:12:15 PM by superczar
Quote from: franco on May 10, 2023, 12:23:26 PM
There might be changes coming to dhcp6c soon. We can try one more thing if you are able to reproduce this at will. Would appreciate the help here!

This might fix it too.. https://github.com/opnsense/dhcp6c/commit/db9f45927


Cheers,
Franco

So there is more to this issue it seems :(

After changing the release flag, the connectivity on the WAN side remained stable for another hour or so after which clients started to fail to route via ipv6.

Clients had a valid ipv6 address but ping6/tracroute6  could not find a valid route.
So I checked further and (not sure if this is relevant- i am not an expert on ipv6), the WAN IP had a /128 prefix now.

The delegated prefix was /64 (see attached screenshot)

Again, I would have normally blamed the ISP for this but I decided to test further and plugged in an old TP Link ER-605 router bridged to the ISP equipment.
With the ER605, I was able to instantly obtain a xxxx/64 address for the router as well as the LAN SLAAC+Stateless DHCP mode selected) and everything is working as expected.

The ipv6 connectivity was also immediate as against a reasonably long wait time on opnsense after pppoe completion and ipv4 up.

Not quite sure what could be causing this :|
I can't really use the ER605 as a replacement for opnsense for many reasons - not to mention that it does not have any firewall for v6

I recently setup an OPNsense router on my side. I have read somewhere that some ISP require the option ,,Only request a prefix". If I got it right it's necessary as some ISP provide the prefix by DHCPv6 and the WAN address via SLAAC. Maybe you could try this option?

I'll try the new version 23.1.7. with the DHCP6c patch as soon as I have some time. Currently my IPv6 seems stable as there is no nightly ASSIA forcing a reconnect.

Quote from: Cyberturtle on May 10, 2023, 10:23:04 PM
I recently setup an OPNsense router on my side. I have read somewhere that some ISP require the option ,,Only request a prefix". If I got it right it's necessary as some ISP provide the prefix by DHCPv6 and the WAN address via SLAAC. Maybe you could try this option?

I'll try the new version 23.1.7. with the DHCP6c patch as soon as I have some time. Currently my IPv6 seems stable as there is no nightly ASSIA forcing a reconnect.

Thanks - I set the WAN to only request a prefix and that seems to have fixed the /128 issue
The WAN side is now getting a /64 address and prefix
The LAN interface also has a /64 prefix

Here is where it gets weird

LAN clients are also getting a /64 prefix but cannot ping6 even the LAN router addres, let alone a WAN address

eg Client 1 has obtained 2401:4900:1c02:1045:x:x:x:4c50/64
LAN Inteface on router is 2401:4900:1c02:1045:x:x:x:963e/64

but ping6 or traceroute6 from client 1 to LAN interface results in the following that does not proceed any further

abhinav@Abhinavs-MacBook-Air ~ % ping6 2401:4900:1c02:1045:x:x:x:963e
PING6(56=40+8+8 bytes) 2401:4900:1c02:1045:x:x:x:4c50 --> 2401:4900:1c02:1045:x:x:x:963e


traceroute6 to 2401:4900:1c02:1045:x:x:x:963e (2401:4900:1c02:1045:x:x:x:963e) from 2401:4900:1c02:1045:x:x:x:4c50, 64 hops max, 12 byte packets
1  2401:4900:1c02:1045:x:x:x:4c50  3033.698 ms !A  3033.286 ms !A  3033.940 ms !A


Thanks @Cyberturtle and @Franco

Based on your hints, I think I may have figured it out.
There was way too much finagling I had done on the opnsense router to fix this and I figured a fresh start may be the right way to do it.


A fresh install + setting the Prevent release and only request prefix flag to enabled, things seem to be fine now..

You should definitely keep "request prefix only" in your settings.

I'd still hope you could install the dhcp6c test version

# opnsense-revert -z dhcp6c

and disable prevent release to see if that makes a difference after the next boot and subsequent disconnect.


Cheers,
Franco

Hey superczar,

nice, that it's working now. Maybe your clients were not able to reset their routes. According to the RFC 4862 chapter 5.5.3 a few parameters are considered for updating the prefix to avoid downtime via RA by hackers.
That's the reason, why I have modified the RADVD parameters a bit. (Especially PreferredLifetime and ValidLifetime.)
You can have a look at my attachment how I have set it up. A new prefix is only considered if it's new lifetime is greater than two hours otherwise the old one is being used as long as it has a valid life time. Additionally some clients only consider the preferred lifetime for considering multiple prefixes before they changed to the new valid one. That's why I have chosen twice the lifetime of MaxInterval. So updating the prefixes where clients don't consider valid lifetime right, takes only max. ten minutes.
On my first try I had smaller values on all variables, but that created some trouble on my wireless clients and my access points as the network was flooded by too many broadcasts.
I don't now if my settings are still needed with the dhcp6c patch @franco has mentioned. Maybe he can give us some insights?

Greetings,
Cyberturtle

There is some more discussion and progress over at https://github.com/opnsense/core/issues/6522. It looks like our RENEW and REBIND handling is a bit too lax WRT shifting prefix.


Cheers,
Franco

Quote from: franco on May 11, 2023, 08:08:29 AM
You should definitely keep "request prefix only" in your settings.

I'd still hope you could install the dhcp6c test version

# opnsense-revert -z dhcp6c

and disable prevent release to see if that makes a difference after the next boot and subsequent disconnect.


Cheers,
Franco

Sure, can try that out this weekend.
Currently it's been 2+ days and connectivity haas been rock stable but It may be worth pointing out a couple of things

a) I have a virtualized opnsense install on proxmox and the adapters use VirtIO - not sure if that could be one of the underlying reasons

b) One of the things I had done on my previous install was installing sensei.
Looking through the forums, it seems there are other users who have also had issues with a combination of sensei with  ipv6.
At the time of testing, even though sensei was disabled but the fact that LAN client could not even ping the valid ipv6 addr of the LAN interface makes me suspect that the presence of sensei could be one of the underlying reasons (or ot could be a complete red herring too)
Either way, the fresh install does not have sensei and  I have now added every other plugin I had previously.

This weekend, I will backup the opnsense VM and re-add sensei to see if it breaks connectivity

On an unrelated note, now that ipv6 seems to be running stable, I added a pass ICMP rule on the primary WAN interface to the LAN network alias as I had read somewhere that ipv6 relies a lot on certain kind of ICMP requests.
Now it could be a placebo but post the change, the general(and subjective)  internet responsiveness seems to have improved.