Gateway monitoring issue with multiple IPv6 WAN addresses

Started by markus.tobatz, April 15, 2024, 09:37:18 AM

Previous topic - Next topic
April 15, 2024, 09:37:18 AM Last Edit: April 15, 2024, 04:18:26 PM by markus.tobatz
I have a similar problem to https://forum.opnsense.org/index.php?topic=38757.0 and https://forum.opnsense.org/index.php?topic=39734.0

I have been using OPNsense 24.1.5 with fiber optics from Telekom. IPv4 and IPv6 work perfectly with it. Due to occasional outages, I have retrofitted a 4G modem and now also use LTE from Telekom. The LTE was not very easy to set up, but at least a connection seems to be established for the time being. The current situation is that I receive an IPv4 and an IPv6 address for both fiber optics and LTE. For fiber optics, I also request a /56 prefix.

I now want to store a monitoring IP for the gateways. I use the IPv4 and IPv4 DNS servers from Google for this. The overview shows that the ping also works, but the FIBER_DHCP6 interface is causing problems. If I switch off the LTE interface, the check on the FIBER_DHCP6 also works immediately. Does anyone have any ideas? I have attached the routing table and interface configuration. What is striking is that the IPv6 address on the FIBER_DHCP6 interface has the status "detached". No matter what IPv6 settings I make for fiber and LTE in OPNsense, nothing changes.

BTW: When trying to do a trace route from a connected client to any external IPv6 host, I can see, that the FIBER_DHCP6 interface is going to be used. So it is up and running.

pppoe0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
        description: FIBER (opt1)
        inet6 fe80::2d0:b4ff:fe01:a913%pppoe0 prefixlen 64 scopeid 0x11
        inet6 fe80::2d0:b4ff:fe01:a914%pppoe0 prefixlen 64 scopeid 0x11
        inet6 2003:a:b:c:2d0:b4ff:fe01:a913 prefixlen 64 detached autoconf
        inet 80.a.b.c --> 80.146.129.38 netmask 0xffffffff
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
ppp1: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
        description: LTE (opt9)
        inet6 fe80::2d0:b4ff:fe01:a913%ppp1 prefixlen 64 scopeid 0x12
        inet6 fe80::48a9:23e0:5266:9e38%ppp1 prefixlen 64 scopeid 0x12
        inet6 2a01:x:y:z:2d0:b4ff:fe01:a913 prefixlen 64 autoconf
        inet 10.x.y.z --> 10.64.64.1 netmask 0xffffffff
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>


# netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            80.146.129.38      UGS      pppoe0
8.8.4.4            10.64.64.1         UGHS       ppp1
8.8.8.8            80.146.129.38      UGHS     pppoe0
10.10.10.0         link#16            UHS         lo0
10.10.10.0/24      link#16            U           wg1
10.10.10.1         link#16            UHS         wg1
10.10.10.2         link#16            UHS         wg1
10.64.64.1         link#18            UH         ppp1
10.x.y.z      link#18            UHS         lo0
80.a.b.c       link#17            UHS         lo0
80.146.129.38      link#17            UH       pppoe0
127.0.0.1          link#5             UH          lo0
192.168.1.0/24     link#2             U          igc1
192.168.1.1        link#2             UHS         lo0
192.168.10.0/24    link#11            U      vlan02.1
192.168.10.1       link#11            UHS         lo0
192.168.20.0/24    link#12            U      vlan02.2
192.168.20.1       link#12            UHS         lo0
192.168.30.0/24    link#13            U      vlan02.3
192.168.30.1       link#13            UHS         lo0
192.168.40.0/24    link#14            U      vlan02.4
192.168.40.1       link#14            UHS         lo0
192.168.50.0/24    link#15            U      vlan02.5
192.168.50.1       link#15            UHS         lo0

Internet6:
Destination                       Gateway                       Flags     Netif Expire
default                           fe80::fe96:43ff:fee9:3fb0%pppoe0 UGS   pppoe0
::1                               link#5                        UHS         lo0
2001:4860:4860::8844              fe80::85f5:6104:e943:f277%ppp1 UGHS      ppp1
2001:4860:4860::8888              fe80::fe96:43ff:fee9:3fb0%pppoe0 UGHS   pppoe0
2003:d:e:f00::/56            ::1                           USB         lo0
2003:d:e:f01::/64            link#2                        U          igc1
2003:d:e:f01:2d0:b4ff:fe01:a914 link#2                     UHS         lo0
2003:d:e:f0a::/64            link#11                       U      vlan02.1
2003:d:e:f0a:2d0:b4ff:fe01:a915 link#11                    UHS         lo0
2003:d:e:f14::/64            link#12                       U      vlan02.2
2003:d:e:f14:2d0:b4ff:fe01:a915 link#12                    UHS         lo0
2003:d:e:f1e::/64            link#13                       U      vlan02.3
2003:d:e:f1e:2d0:b4ff:fe01:a915 link#13                    UHS         lo0
2003:a:b:c:2d0:b4ff:fe01:a913 link#17                     UHS         lo0
2003:180:2::53                    fe80::fe96:43ff:fee9:3fb0%pppoe0 UGHS   pppoe0
2003:180:2:6000::53               fe80::fe96:43ff:fee9:3fb0%pppoe0 UGHS   pppoe0
2a01:598:7ff:0:10:74:210:210      fe80::85f5:6104:e943:f277%ppp1 UGHS      ppp1
2a01:598:7ff:0:10:74:210:211      fe80::85f5:6104:e943:f277%ppp1 UGHS      ppp1
2a01:x:y:z::/64            link#18                       U          ppp1
2a01:x:y:z:2d0:b4ff:fe01:a913 link#18                    UHS         lo0
fe80::%igc1/64                    link#2                        U          igc1
fe80::2d0:b4ff:fe01:a914%igc1     link#2                        UHS         lo0
fe80::%lo0/64                     link#5                        U           lo0
fe80::1%lo0                       link#5                        UHS         lo0
fe80::%vlan02.10/64               link#11                       U      vlan02.1
fe80::2d0:b4ff:fe01:a915%vlan02.10 link#11                      UHS         lo0
fe80::%vlan02.20/64               link#12                       U      vlan02.2
fe80::2d0:b4ff:fe01:a915%vlan02.20 link#12                      UHS         lo0
fe80::%vlan02.30/64               link#13                       U      vlan02.3
fe80::2d0:b4ff:fe01:a915%vlan02.30 link#13                      UHS         lo0
fe80::%pppoe0/64                  link#17                       U        pppoe0
fe80::2d0:b4ff:fe01:a913%pppoe0   link#17                       UHS         lo0
fe80::2d0:b4ff:fe01:a914%pppoe0   link#17                       UHS         lo0
fe80::%ppp1/64                    link#18                       U          ppp1
fe80::2d0:b4ff:fe01:a913%ppp1     link#18                       UHS         lo0
fe80::48a9:23e0:5266:9e38%ppp1    link#18                       UHS         lo0

Check your DNS server mapping first. Servers are not allowed to overlap between WAN gateways as they set host routes which otherwise skew your health test.


Cheers,
Franco

Are you talking about the list of DNS servers and the related gateway under System > Settings > General? It makes no difference if this list is set up or empty.

Yes, those to check first. Your ISP may be supplying overlapping servers, too.

In multi-WAN it's best to pin your servers explicitly (according to monitoring targets) and avoid overriding from ISP.

https://docs.opnsense.org/manual/how-tos/multiwan.html#step-3-configure-dns-for-each-gateway


Cheers,
Franco

April 15, 2024, 11:06:02 AM #4 Last Edit: April 15, 2024, 04:18:46 PM by markus.tobatz
Ok. As I just said, it makes no difference. I tried it with the settings as in the screenshot and I left the mapping completely empty and allowed to use the DNS of the ISP.

I've rebooted OPNsense multiple times already, to get rid of any cached settings, but that doesn't help too.

BTW: I don't understand, why this is related to the problem. As one can see in routing table, the relevant Google IP addresses are routed through the correct gateway. Trying to trace route the relevant IPv6 address on the OPNsense shell, I get:

# traceroute6 2001:4860:4860::8888
traceroute6 to 2001:4860:4860::8888 (2001:4860:4860::8888) from 2003:d:e:f0a:2d0:b4ff:fe01:a915, 64 hops max, 28 byte packets
1  2003:0:8707:3800::1  1.298 ms  1.337 ms  1.175 ms
2  * * *
3  2003:0:1304:8002::2  8.496 ms  8.205 ms  8.358 ms
4  2001:4860:0:1::88af  9.109 ms
    2001:4860:0:1::86a9  7.882 ms
    2001:4860:0:1::88d5  8.104 ms
5  2001:4860:0:1::c3  8.353 ms
    2001:4860:0:1::4177  9.165 ms
    2001:4860:0:1::157f  8.519 ms
6  dns.google  8.033 ms  8.131 ms  7.984 ms




I can't find "2003:a:b:c::1" or "2003:0:1304:8010::2" in your posts so I'm unsure what we're checking against what now.

And doing a traceroute without a source address (-s) will likely just use an available path if the route pinning for monitoring didn't work.


Cheers,
Franco

April 15, 2024, 04:04:24 PM #6 Last Edit: April 15, 2024, 04:19:06 PM by markus.tobatz
You're right, I've masked too much IPs (changed it in post above).
Why should you find these both 2003:: IPs in my posts? They seem to belong to my ISP (Telekom). They are the first hops after my router (sorry, that's my naive understanding). The trace route is executed on the OPNsense directly.

Ah, I've tried to run the traceroute6 on OPNsense using parameter -s for my FIBER_DHCP6 interface, which fails. Doing the same with the LTE_DHCP6 the trace route is executed:

# traceroute6 -s 2003:a:b:c:2d0:b4ff:fe01:a913 2001:4860:4860::8888
bind: Can't assign requested address
# traceroute6 -s 2a01:x:y:z:2d0:b4ff:fe01:a913 2001:4860:4860::8844
traceroute6 to 2001:4860:4860::8844 (2001:4860:4860::8844) from 2a01:x:y:z:2d0:b4ff:fe01:a913, 64 hops max, 28 byte packets
[...]


This leads me back to my question: Why is the IPv6 address on the pppoe0 interface (FIBER_DHCP6) detached?

I had to check the "ifconfig" man page to find that "detached" is not even documented.

I crawled through the code and found:

https://github.com/opnsense/src/blob/22365c93a148fec6fbb999d1b8a9ef8c40a18101/sys/netinet6/in6_var.h#L496

Which is not overly helpful, but it's a bit too late today to do more digging.


Cheers,
Franco

Just wanted to confirm, that with my problem here: https://forum.opnsense.org/index.php?topic=39734.0
in the error case the inet6 address in question also has the "detached" state.

In NetBSD the ifconfig(8) man page at least says: "The detached flag is set when the interface does not have a carrier.", but what that means in practice for an IPv6 address is not known to me.


Cheers,
Franco

Just for testing, I've disabled request for /56 subnet on FIBER interface (so it gets the /64 from the ISP for itself only) and removed all IPv6 trackings on VLANs, but no change.

April 16, 2024, 06:11:57 PM #11 Last Edit: April 17, 2024, 05:41:38 AM by markus.tobatz
I did some more tests. Simply adding the IPv6 to the interface on CLI one more time manually did the trick:

ifconfig pppoe0 inet6 2003:a:b:d:2d0:b4ff:fe01:a913/64

Then it is not in detached state anymore and trace route does work too:

# traceroute6 -s 2003:a:b:d:2d0:b4ff:fe01:a913 2001:4860:4860::8888
bind: Can't assign requested address
# ifconfig pppoe0 inet6 2003:a:b:d:2d0:b4ff:fe01:a913/64
# traceroute6 -s 2003:a:b:d:2d0:b4ff:fe01:a913 2001:4860:4860::8888
traceroute6 to 2001:4860:4860::8888 (2001:4860:4860::8888) from 2003:a:b:d:2d0:b4ff:fe01:a913, 64 hops max, 28 byte packets
1  2003:0:8707:3800::1  1.209 ms  1.253 ms  4.544 ms
[...]


After some seconds, the monitoring status in OPNsense GUI changed to green status. Doing multiple tests, this doesn't seem to work very stable. The status is flappening sometimes.
Nevertheless, now we've to find out why this happens and how to prevent this (i.e. find a workaround). But the problem is, that this is a dynamic IPv6 address provided by ISP.

For the time being I think ignoring the detached state is a good way to ensure consistency:

https://github.com/opnsense/core/commit/5db3c349

I'll dig into the kernel sources later to see what is causing an address to be "detached", but it's also clear that this cannot be unset from ifconfig except for removing and resetting the address after the fact. I feel like some piece of software not us should be monitoring the state and resetting, but if this happens due to a specific workflow it might always be end up like this despite the service's best effort (e.g. mpd5).


Cheers,
Franco