IPv6 stops working a couple minutes after reboot

Started by gary201, December 19, 2020, 08:17:38 PM

Previous topic - Next topic
I'm looking for help trying to troubleshoot IPv6 issues.  All of this is done FROM an SSH/console login to OpnSense (e.g. the shell; forget about what happens on the LAN side).

When OpnSense is rebooted, IPv6 works for a couple minutes, then it just stops working.  By working I mean ping6 or traceroute6 from OpnSense on the WAN interface.

For example, reboot OpnSense, then login to the console (or ssh).  From the shell, I can successfully ping6 google.com.  Wait a couple minutes, retry the command and it fails.  This has been happening for a couple months (ever since my ISP did some maintenance a few months ago).  I haven't been able to get them to reveal what those changes were or really say anything.  So what I'm looking for is a way to troubleshoot what might be happening where it works only for a couple minutes after a reboot.  This is repeatable.  Are there any commands I can use to try and diagnose what's happening?

I did read that someone suggested using 'netstat -rn6' to check the routing table.

root@OPNsense:~ # netstat -rn6
Routing tables

Internet6:
Destination                       Gateway                       Flags     Netif Expire
default                           fe80::224:45ff:fe8e:cbd3%vmx0 UGS        vmx0
::1                               link#4                        UH          lo0
2605:9480:10b:4000::/50           link#1                        U          vmx0
2605:9480:258:ff10::/64           link#2                        U          vmx1
2605:9480:258:ff10:250:56ff:fea0:fe73 link#2                    UHS         lo0
fe80::%vmx0/64                    link#1                        U          vmx0
fe80::250:56ff:fea0:c131%vmx0     link#1                        UHS         lo0
fe80::%vmx1/64                    link#2                        U          vmx1
fe80::250:56ff:fea0:fe73%vmx1     link#2                        UHS         lo0
fe80::%lo0/64                     link#4                        U           lo0
fe80::1%lo0                       link#4                        UHS         lo0

Sample of the issue (the following is a single copy/paste where the commands were run just a couple minutes apart after a reboot):

root@OPNsense:~ # ping6 google.com
PING6(56=40+8+8 bytes) 2605:9480:258:ff10:250:56ff:fea0:fe73 --> 2607:f8b0:4006:802::200e
16 bytes from 2607:f8b0:4006:802::200e, icmp_seq=0 hlim=117 time=9.095 ms
16 bytes from 2607:f8b0:4006:802::200e, icmp_seq=1 hlim=117 time=9.187 ms
16 bytes from 2607:f8b0:4006:802::200e, icmp_seq=2 hlim=117 time=9.324 ms
^C
--- google.com ping6 statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 9.095/9.202/9.324/0.094 ms
root@OPNsense:~ # ping6 google.com
PING6(56=40+8+8 bytes) 2605:9480:258:ff10:250:56ff:fea0:fe73 --> 2607:f8b0:4006:81a::200e
^C
--- google.com ping6 statistics ---
25 packets transmitted, 0 packets received, 100.0% packet loss


I second that (running 20.7.7_1), even though I cannot break it down to some time working after start, from my point of view it is never working.

If I do login to the console (IPv6 activated, GW setup via DHCv6, all FW rules allowing all IPv6 traffic) from the console and I do this to detect my next IPv6 router via multicast, it is detected:

root@OPNsense:~ # ping6 -c 2 ff02::2%bce0
PING6(56=40+8+8 bytes) fe80::221:5eff:fec8:be88%bce0 --> ff02::2%bce0
16 bytes from fe80::c225:6ff:feff:820d%bce0, icmp_seq=0 hlim=64 time=0.531 ms
16 bytes from fe80::c225:6ff:feff:820d%bce0, icmp_seq=1 hlim=64 time=0.465 ms

--- ff02::2%bce0 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.465/0.498/0.531/0.033 ms


A few seconds later, I just try to directly ping the answering IPv6 router and I cannot talk to it:

root@OPNsense:~ # ping6 -c 1 fe80::c225:6ff:feff:820d%bce0
PING6(56=40+8+8 bytes) fe80::221:5eff:fec8:be88%bce0 --> fe80::c225:6ff:feff:820d%bce0

--- fe80::c225:6ff:feff:820d%bce0 ping6 statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss


So, IPv6 does for sure not work and to my records this is since quite some time back within 20.7. I had several unsolved issues here already and I would be more than happy to support this getting solved.
Running OPNsense on 4 core Intel Xeon E5506, 20GB RAM, 2x Broadcom NetXtreme II BCM5709, 4x Intel 82580
Ubench Single CPU: 307897 (0.39s)

Not sure if you're aware of the patch that replaces radvd with rtadvd here (can't say this is the exact same problem you're experiencing...but it did start with 20.7):

https://github.com/opnsense/core/issues/4338

opnsense-patch 9a4a908
HP T730/AMD  RX-427BB/8GB/500GB SSD
HP NC365T 4-PORT

Thanks for the hint, I read that thread as well but this one is one step further already. I'm one stage before. I follow nearly the same approach as described in the beginning of the issue and get via DHCPv6 request send from my OPNsense an address from my IPv6 proving router, but I'm not at the point (yet) of advertising addresses further into my networks.

My example is still showing to be stuck at not being able to contact the providing GW again via link local address while I saw it perfectly answering on the multicast before.
I would guess, gary201 has the same problem, as he is also having the gateway as link local address in the routing table - exactly as I have. So there is nothing wrong that way, but why can't it be pinged directly from the WAN interface?
Running OPNsense on 4 core Intel Xeon E5506, 20GB RAM, 2x Broadcom NetXtreme II BCM5709, 4x Intel 82580
Ubench Single CPU: 307897 (0.39s)

I had something similar with 20.7 where it seemed like (and this is admittedly sketchy) I wasn't getting an allocation from my ISP but only on a cold boot (happened after two different power outages).  The solution I discovered was to go into the WAN interface and save (no changes) and then also on the LAN interface and save.  That seemed to kick things off.  Honestly, haven't gone back to test to see if the patch I mentioned fixed that particular issue too, the patch obviously addresses at least LAN ipv6 advertisements...so again, not sure.  Might try that...perhaps it will provide a clue or hint as to what's going on.  I'm using native DHCPv6 with my ISP (Spectrum) with 56 prefix size.
HP T730/AMD  RX-427BB/8GB/500GB SSD
HP NC365T 4-PORT

Let me also add that this issue (start of this thread where connectivity fails when the upstream router NDP entry goes stale) is NOT unique to OpnSense (freebsd).  I can repeat that from a Windows Server 2016 and Windows 10 system connected directly to my ISP ONT. 

My ISP (Greenlight Networks in Western NY) still insists that everything is just fine (despite having told them that the maintenance activity they did a couple months ago is when this issue started, and IPv6 hasn't worked since) and that the issue is repeatable across different OSs when connected to their ONT.  The fact that I can reproduce on different OSs clearly indicates the issue isn't isolated to OpnSense (freebsd).

At this point, I just don't know what to do to further troubleshoot this.

Once the problem starts (e.g. NDP entry goes stale) on OpnSense, if I try to ping the upstream routers link-local address, ping6 returns an error.

PING6(56=40+8+8 bytes) fe80::250:56ff:fea0:c131%vmx0 --> fe80::c2d6:82ff:fe64:54%vmx0
ping6: sendmsg: No route to host
ping6: wrote fe80::c2d6:82ff:fe64:54%vmx0 16 chars, ret=-1
ping6: sendmsg: No route to host
ping6: wrote fe80::c2d6:82ff:fe64:54%vmx0 16 chars, ret=-1
ping6: sendmsg: No route to host
ping6: wrote fe80::c2d6:82ff:fe64:54%vmx0 16 chars, ret=-1
^C
--- fe80::c2d6:82ff:fe64:54%vmx0 ping6 statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss


I never actually thought to try pinging the all-routers multicast address, but doing so yields the same result.

root@OPNsense:~ # ping6 -c 4 ff02::2%vmx0
PING6(56=40+8+8 bytes) fe80::250:56ff:fea0:c131%vmx0 --> ff02::2%vmx0
ping6: sendmsg: No route to host
ping6: wrote ff02::2%vmx0 16 chars, ret=-1
ping6: sendmsg: No route to host
ping6: wrote ff02::2%vmx0 16 chars, ret=-1
ping6: sendmsg: No route to host
ping6: wrote ff02::2%vmx0 16 chars, ret=-1
^C
--- ff02::2%vmx0 ping6 statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss


I wonder if the solution lies in manually creating a persistent route (though at the moment I have no idea how I would do that for ipv6).  However, I would argue that I should never ever have to do that, but it might be at least an interesting learning experience. 


December 22, 2020, 10:45:04 AM #8 Last Edit: December 22, 2020, 11:00:08 AM by marjohn56
There are two issues at play here. The first that @andreaslink talks about is an issue where the link-local local address of the gateway cannot be pinged. This causes an issue with dpinger gateway monitoring, a workaround is to use another target address such as one of the google DNS server addresses, note if you use multwan do not use the same address for both v6 monitors, also if you are using googles DNS for resolution then find some other WAN v6 targets. In saying that, I have seen instances where the ISP is not advertising the correct info out on their v6 advertisement which prevents a route being set properly at all.


The second issue is where RADVD goes cranky and stops advertising the routes to clients on the LAN, you can easily check this by looking at the v6 gateway on the clients, on windows for example an ipconfig /all will show that there is no v6 gateway, of course you must also have a solid WAN v6 connection that you know is working; this is where that RTADVD patch comes into play by replacing RADVD with the aforementioned daemon.  The perfect solution would be to fic radvd, but in the meantime the patch should suffice in most cases.                             


EDIT: @andreaslink - can you try disabling Static route filtering on the Firewall->Advanced page, I've seen this fix the issue for some.
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

December 22, 2020, 02:42:27 PM #9 Last Edit: December 22, 2020, 02:44:39 PM by marjohn56
Haven't had any time recently to play and have a look at what's going on, today I have a a little.

I zeroed my test router and installed 20.7 clean and updated to current. No changes, just a simple WAN/LAN setup with DHCP6 on the WAN and tracked on the LAN, Upstream router is my primary Opnsense router so a test to see if that's behaving too. My Primary is all static, thanks to a proper ISP!

OK, back to the test router. First thing I noticed is the gateway is, as has been pointed out, not monitoring on a default dynamic, so I checked to see if the defaultgwv6 file is being created and it is. So I opened that file and pasted the address into the monitor ip, monitor started working..., then I cleared it... and it still worked.., hmm, very odd. reboot...


Monitor not working after reboot, pasted in link local address.. monitoring working. Seems the monitor IP when left blank is not being set to the gateway ll address. Also the Gateway itself is just showing blank in System->Gateways->Single.

IPv4 is fine, both Gateway and Monitor IP are showing correctly.
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

I dont have an issue with the GW ipv6 link local address showing up in system>gateways>single.  Also the reference to restarting radvd has no effect on the inability to ping the upstream LL gateway.

The interesting thing I noticed this morning is that the upstream gateway IPv6 address (non LL) is actually showing up in "ndp -a" (where I don't recall that being the case previously).

(Note: I manually removed from the output anything w/the Netif was on the LAN as that's not relevant for this discussion).

root@OPNsense:~ # ndp -a
Neighbor                             Linklayer Address  Netif Expire    S Flags
2605:9480:10b:4000::                 98:f4:ab:ca:a9:e7   vmx0 7h8m40s   S
2605:9480:10b:4000::1                c0:d6:82:64:00:54   vmx0 8h50m15s  S R
fe80::3223:3ff:fefa:4d8a%vmx0        30:23:03:fa:4d:8a   vmx0 5h22m59s  S R
fe80::224:45ff:fe8e:cbd3%vmx0        (incomplete)        vmx0 expired   I  2
fe80::7ad2:94ff:fe9a:5cb%vmx0        78:d2:94:9a:05:cb   vmx0 5h22m59s  S R
fe80::46a5:6eff:fe42:10c0%vmx0       44:a5:6e:42:10:c0   vmx0 5h23m0s   S R
fe80::c2d6:82ff:fe64:54%vmx0         c0:d6:82:64:00:54   vmx0 23h47m25s S R
fe80::250:56ff:fea0:c131%vmx0        00:50:56:a0:c1:31   vmx0 permanent R


I can say that pinging the upstream route DOES work (e.g. 2605:9480:10b:4000::1).

Netstat -rn reports this:

root@OPNsense:~ # netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            100.64.108.1       UGS        vmx0
100.64.108.0/22    link#1             U          vmx0
100.64.109.236     link#1             UHS         lo0
127.0.0.1          link#4             UH          lo0
192.168.11.0/24    link#2             U          vmx1
192.168.11.1       link#2             UHS         lo0

Internet6:
Destination                       Gateway                       Flags     Netif Expire
default                           fe80::224:45ff:fe8e:cbd3%vmx0 UGS        vmx0
::1                               link#4                        UH          lo0
2605:9480:10b:4000::/50           link#1                        U          vmx0
2605:9480:258:ff10::/64           link#2                        U          vmx1
2605:9480:258:ff10:250:56ff:fea0:fe73 link#2                    UHS         lo0
fe80::%vmx0/64                    link#1                        U          vmx0
fe80::250:56ff:fea0:c131%vmx0     link#1                        UHS         lo0
fe80::%vmx1/64                    link#2                        U          vmx1
fe80::250:56ff:fea0:fe73%vmx1     link#2                        UHS         lo0
fe80::%lo0/64                     link#4                        U           lo0
fe80::1%lo0                       link#4                        UHS         lo0
root@OPNsense:~ #


Pinging the "default" ( fe80::224:45ff:fe8e:cbd3%vmx) does not work.

root@OPNsense:~ # ping6 -c 2  fe80::224:45ff:fe8e:cbd3%vmx0
PING6(56=40+8+8 bytes) fe80::250:56ff:fea0:c131%vmx0 --> fe80::224:45ff:fe8e:cbd3%vmx0
ping6: sendmsg: No route to host
ping6: wrote fe80::224:45ff:fe8e:cbd3%vmx0 16 chars, ret=-1
ping6: sendmsg: No route to host
ping6: wrote fe80::224:45ff:fe8e:cbd3%vmx0 16 chars, ret=-1


Should I actually expect the upstream ipv6 (non-LL) address to be listed in the "netstat -rn" "default" entry?

Remember also that I can reproduce this issue on two different Windows operating systems as well (e.g NDP entry goes stale, can't ping upstream anymore).

Am I totally out in the weeds on this?


In your case I would say f*** ** and just use a good WAN IP with low latency as the target, if that works then you have a reliable monitor, which at the end of the day is what you need.
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

Thanks for your input, @marjohn56!

Regarding:
QuoteThis causes an issue with dpinger gateway monitoring, a workaround is to use another target address such as one of the google DNS server addresses, note if you use multwan do not use the same address for both v6 monitors, also if you are using googles DNS for resolution then find some other WAN v6 targets.

I'm not sure, if this is the case as I do not use a monitoring IP yet in my via DHCPv6 assigned IPv6 WAN IF. I kept it default and then it is empty, so I consider this interface always as being up to not get into any trouble with dpinger. And I do not use multiwan (yet).

Regarding your hint, I tested that:
QuoteEDIT: @andreaslink - can you try disabling Static route filtering on the Firewall->Advanced page, I've seen this fix the issue for some.
I was so motivated, but this did not change anything.

Reading the posts from @gary201 I also checked some of the tables. Looking into ndp tables, I see my link local IPv6 gateway in there, so this seems to be a difference to your problem:


root@OPNsense:~ # ndp -a
Neighbor                             Linklayer Address  Netif Expire    S Flags
fe80::92e2:baff:fe68:cd75%igb1       90:e2:ba:68:cd:75   igb1 permanent R
fe80::92e2:baff:fe68:cd74%igb0       90:e2:ba:68:cd:74   igb0 permanent R
fe80::221:5eff:fec8:be8a%bce1        00:21:5e:c8:be:8a   bce1 permanent R
fd00:0:cafe:affe:221:5eff:fec8:be88  00:21:5e:c8:be:88   bce0 permanent R
OPNsense.lan                         00:21:5e:c8:be:88   bce0 permanent R
fe80::221:5eff:fec8:be88%bce0        00:21:5e:c8:be:88   bce0 permanent R
fe80::c225:6ff:feff:820d%bce0        c0:25:06:ff:82:0d   bce0 23h57m29s S R
2a02:xxx:xxxx:7700:307d:9cff:feca:7797 32:7d:9c:ca:77:97 bce0 8h55m29s  S
fe80::307d:9cff:feca:7797%bce0       32:7d:9c:ca:77:97   bce0 8h55m34s  S

root@OPNsense:~ # netstat -rn6
Routing tables

Internet6:
Destination                       Gateway                       Flags     Netif Expire
default                           fe80::c225:6ff:feff:820d%bce0 UG         bce0
::1                               link#8                        UH          lo0
2a02:xxx:xxxx:7700::/64           link#1                        U          bce0
2a02:xxx:xxxx:7700:221:5eff:fec8:be88 link#1                    UHS         lo0
fd00:0:cafe:affe::/64             link#1                        U          bce0
fd00:0:cafe:affe:221:5eff:fec8:be88 link#1                      UHS         lo0
fe80::%bce0/64                    link#1                        U          bce0
fe80::221:5eff:fec8:be88%bce0     link#1                        UHS         lo0
fe80::%bce1/64                    link#2                        U          bce1
fe80::221:5eff:fec8:be8a%bce1     link#2                        UHS         lo0
fe80::%igb0/64                    link#3                        U          igb0
fe80::92e2:baff:fe68:cd74%igb0    link#3                        UHS         lo0
fe80::%igb1/64                    link#4                        U          igb1
fe80::92e2:baff:fe68:cd75%igb1    link#4                        UHS         lo0
fe80::%lo0/64                     link#8                        U           lo0
fe80::1%lo0                       link#8                        UHS         lo0


All the rest still behaving the same way and I cannot connect to the internet via IPv6:
root@OPNsense:~ # ping6 -c 1 fe80::c225:6ff:feff:820d%bce0
PING6(56=40+8+8 bytes) fe80::221:5eff:fec8:be88%bce0 --> fe80::c225:6ff:feff:820d%bce0

--- fe80::c225:6ff:feff:820d%bce0 ping6 statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss


root@OPNsense:~ # ping6 -c 1 ipv6.google.com
PING6(56=40+8+8 bytes) 2a02:xxx:xxxx:7700:221:5eff:fec8:be88 --> 2607:f8b0:4004:82a::200e

--- ipv6.l.google.com ping6 statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss


root@OPNsense:~ # ping6 -c 2 ff02::2%bce0
PING6(56=40+8+8 bytes) fe80::221:5eff:fec8:be88%bce0 --> ff02::2%bce0
16 bytes from fe80::c225:6ff:feff:820d%bce0, icmp_seq=0 hlim=64 time=2.595 ms
16 bytes from fe80::c225:6ff:feff:820d%bce0, icmp_seq=1 hlim=64 time=1.674 ms

--- ff02::2%bce0 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 1.674/2.135/2.595/0.460 ms


BTW, IPv4 works as it should, no probs at all.
I'm slightly puzzled, what could prevent the IPv6 connection here. I'm considering setting up a fresh install as this one is already updated since quite some years now.

Could it be, that some default settings have changed over time and the updater does not consider deleting/updating old ones or likewise? A fresh install would proof that; I read something similar here https://forum.opnsense.org/index.php?topic=11341.msg66960#msg66960 some time ago as I'm running out of ideas  :-\.
Running OPNsense on 4 core Intel Xeon E5506, 20GB RAM, 2x Broadcom NetXtreme II BCM5709, 4x Intel 82580
Ubench Single CPU: 307897 (0.39s)

@andreaslink - I can see  from your stats that there are GUA's on the WAN side and lo0 but no GUAs on the other interfaces. What are the system logs reporting about dhcp6c? I assume you are using dhcp6 on the WAN?
OPNsense 24.7 - Qotom Q355G4 - ISP - Squirrel 1Gbps.

Team Rebellion Member

Quote from: andreaslink on December 22, 2020, 04:11:11 PM
I'm slightly puzzled, what could prevent the IPv6 connection here. I'm considering setting up a fresh install as this one is already updated since quite some years now.

I did that myself already.  I created a new install and manually ported the configurations over.  No change in the behavior.  What I don't understand is why my ISP seems to insist everything is fine when every OS I plug directly into their ONT fails to work (and has the same problem; connectivity fails when the NDP entries go stale).  I'm about ready to just give up and call IPv6 connectivity a total loss at this point.  :(