Dual WAN Dual IP Stack: gateway down, dpinger cannot bind to detached IPv6 addr

Started by sbellon, February 10, 2024, 07:57:28 PM

Previous topic - Next topic
Hi all,

after some outages of my VDSL provider, I decided to go for a 4G/LTE backup.

I chose the Netgear LM1200 in bridge mode with a O2 prepaid SIM. Interface in OPNsense is called WAN2. My main connection is via PPPoE and a Vigor 167. Interface is called WAN.

With my (main) WAN I have a dual IP stack, so there exists a gateway for IPv4 and one for IPv6. As I want to set up WAN2 as failover for WAN, I have now added a monitor IP 8.8.8.8 to my WAN IPv4 gateway and a monitor IP 2001:4860:4860::8888 to my WAN IPv6 gateway (also I have unticked the "gateway is always up" checkboxes and ticked the "allow dynamic gateway switching" checkbox).

Also I have created two gateway groups, one for IPv4 and one for IPv6, each containing the WAN IPv4 / WAN2 IPv4 and WAN IPv6 / WAN2 IPv6 respectively (priorities have been set to 254 and 255 as well as chosen as Tier 1 and Tier 2 in the groups).

I can then see two dpinger services running (one for WAN IPv4 monitor IP and one for WAN IPv6 monitor IP).

And now my issue:

This only works as long as I do not enable IPv6 on the 4G/LTE modem. If I configure PDP mode to IPv4v6 (instead of just IPv4), the WAN2 interface also gets assigned an IPv6 address and an IPv6 gateway (which otherwise is empty), and as soon as this happens, the dpinger for the monitor IP of WAN IPv6 goes red, thus marking the WAN IPv6 gateway as down.

I can ping the WAN IPv6 gateway from clients in the LAN as well as from the OPNsense itself, so I wonder why dpinger of the monitor IP of WAN IPv6 goes down as soon as WAN2 also gets IPv6 assigned.

What may - or may not - be of interest is the fact how I get the IP addresses assigned:

WAN IPv4: public Deutsche Telekom IPv4
WAN IPv6: fe80::%pppoe0 link local address, also gateway is fe80:: link local

WAN2 IPv4: private 10.0.0.0/8 IPv4 by O2
WAN2 IPv6: public 2a02::/128 IPv6

I'd be grateful if anyone can see what is going on and what I'm missing in order to get Dual WAN Dual IP Stack with two pairs of two gateways up on green for normal operation. Happy to provide more information or answer questions if necessary.

Greetings,
Stefan

February 11, 2024, 08:23:07 AM #1 Last Edit: February 12, 2024, 09:50:39 AM by sbellon
I debugged a bit further ... I recorded ("ps auwx | grep dpinger") the two processes that were monitoring the gateways in the state that the LTE backup is IPv4 only and the main VDSL is properly operating in dual stack. I get two dpinger processes running as expected:


/usr/local/bin/dpinger -f -S -r 0 -i WAN_PPPOE -B 91.x.x.x -p /var/run/dpinger_WAN_PPPOE.pid -u /var/run/dpinger_WAN_PPPOE.sock -s 1s -l 4s -t 60s -d 1 8.8.8.8
/usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2003:de:x:x:x:x:x:x -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -s 1s -l 4s -t 60s -d 1 2001:4860:4860::8888


When I configure the LTE backup to be IPv4/IPv6, then the IPv6 dpinger dies and only the IPv4 still runs. If I try to start it manually from the console, I get the following:


root@opnsense:~ # /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2003:de:x:x:x:x:x:x -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -s 1s -l 4s -t 60s -d 1 2001:4860:4860::8888
bind: Can't assign requested address
cannot bind send socket


Further investigation have shown that in "Interfaces -> Overview" the 2003:de:x:x:x:x:x:x is missing from the WAN Routes in case LTE has IPv6.

EDIT: Tidied up misleading info as I found out later...

February 11, 2024, 02:52:09 PM #2 Last Edit: February 12, 2024, 09:50:57 AM by sbellon
If I manually start


/usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6  -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -s 1s -l 4s -t 60s -d 1 2001:4860:4860::8888


instead of


/usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2003:de:x:x:x:x:x:x -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -s 1s -l 4s -t 60s -d 1 2001:4860:4860::8888


what the OPNsense tries itself, then I can heal it, i.e. removing the -B bind source address helps.

February 11, 2024, 09:08:34 PM #3 Last Edit: February 12, 2024, 09:53:18 AM by sbellon
I know, it's bad etiquette to follow up one's own posts, but I think, I'm onto something now...

I wanted to question why I cannot do


/usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2003:de:x:x:x:x:x:x -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -s 1s -l 4s -t 60s -d 1 2001:4860:4860::8888


even though ifconfig shows:


pppoe0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
        description: WAN (wan)
        inet 91.x.x.x --> ...
        inet6 fe80::x:x:x:x%pppoe0 ...
        inet6 2003:de:x:x:x:x:x:x prefixlen 64 detached autoconf


And then I noticed the detached! As soon as the LTE WAN2 is configured for IPv6, the pppoe0 inet6 on WAN goes to detached! (Most likely this also is in line with what I already noticed in post #2: The 2003:de:x:x:x:x:x:x address is missing from WAN in "Interfaces -> Overview".)

I googled and found:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263986

Lo and behold, if I manually set ifconfig pppoe0 inet6 2003:de:x:x:x:x:x:x/64 so that the detached autoconf is gone, I can start dpinger and the WAN IPv6 gateway gets back online.

Is there any way via configuration to avoid the inet6 of WAN getting detached when having a WAN2 with IPv6?

Nothing wrong with following up on your own post with additional information.

Unlikely that this is actually the issue you linked to. They're specifically discussing multiple SLAAC addresses on a single interface and RAs with a 0 router lifetime, neither of which is the case here.

I tried two SLAAC interfaces as well as one SLAAC and one DHCPv6 interface and don't see a detached SLAAC address in either case (didn't try a PPPoE parent though). Can you share details about your WAN / WAN2 configurations (SLAAC / DHCPv6)? And what does the Netgear advertise, DHCPv6 or SLAAC or both?

If nothing else works, you could not specify a monitor IP. dpinger will then ping the link-local gateway address, which doesn't require binding to a global source address.

And by the way,
- you don't need gateway groups when using default gateway switching and
- be aware that for IPv6 failover to actually work, you'll need to configure IPv6 outbound NAT for WAN2.

Cheers
Maurice
OPNsense virtual machine images
OPNsense aarch64 firmware repository

Commercial support & engineering available. PM for details (en / de).

Thanks a lot for your reply.

Yes, the issue may not be the same, but there aren't many other useful hits when googling for "inet6 detached" either. On a more general note: what acatually is the meaning of the detached keyword next to an inet6 address in the ifconfig output?

In my case, both WAN and WAN2 are using DHCPv6.

Yes, NOT specifying a monitor IP for WAN IPv6 gateway may be an option to get it working in "normal operating state", but as the WAN IPv6 gateway is a fe80:: address, I think this would not guarantee switching over to WAN2 if the IPv6 connectivity "behind" that fe80:: on the gateway is down, would it?

Regarding your other two comments:
1) I don't need gateway groups? I thought, this is what is being documented here: https://docs.opnsense.org/manual/how-tos/multiwan.html or did I misunderstand?
2) will have to look at the outbound NAT for WAN2 if the rest works. ;-)

Thanks for your help, very much appreciated!

Greetings,
Stefan

There's indeed not a lot of documentation about the detached state. From my understanding, it means that the router which advertised the prefix is unreachable, which makes the SLAAC address unusable. No idea though why the system considers the Telekom router to be unreachable once WAN2 connects to the Netgear.

Deutsche Telekom uses DHCPv6 for Prefix Delegation only, the WAN address is autoconfigured using SLAAC. Did you configure the DHCPv6 client accordingly ("Request only an IPv6 prefix" enabled)? And did you enable "Use IPv4 connectivity", too (required for PPPoE)?
Does WAN2 only have a DHCPv6 address (/128) or a SLAAC address (/64), too?

Correct, when monitoring the gateway address itself, outages further upstream won't be detected.

You only need gateway groups when features like failover for specific LANs or load balancing are required. For a basic setup (no load balancing, all LANs use failover), simply enabling default gateway switching is sufficient. Default gateway switching (which changes the system's default gateway) and gateway groups (which only work with policy routing rules) are not strictly related.
OPNsense virtual machine images
OPNsense aarch64 firmware repository

Commercial support & engineering available. PM for details (en / de).

Yes, I have already configured WAN interface with "Request only an IPv6 prefix" and "Use IPv4 connectivity".

But I had *not* configured those two on WAN2.

Now I have configured "Request only an IPv6 prefix" and "Use IPv4 connectivity" for WAN2 as well and it seems to work (incl. the gateway monitoring on WAN IPv6)!

I'll now try to get rid of the gateway groups as you suggested and try a failover situation.

Thanks for your helps so far, that helped a ton!

"Use IPv4 connectivity" is not required on WAN2 and does nothing since this isn't a PPP interface.

Does WAN2 still get an IPv6 address when "Request only an IPv6 prefix" is enabled there? This would suggest that the Netgear advertises SLAAC (which is expected). Try setting the WAN2 IPv6 configuration type to SLAAC and check whether it keeps working.
OPNsense virtual machine images
OPNsense aarch64 firmware repository

Commercial support & engineering available. PM for details (en / de).

I only get it working without pppoe0 not getting detached, if WAN2 is DHCPv6 and "Request only an IPv6 prefix" enabled. If I untick "Request only an IPv6 prefix" or configure SLAAC, then the pppoe0 inet6 immediately gets detached again (with all its consequences).