Client IPv6 temporary addresses not regenerating after some time

Started by OPNenthu, December 04, 2024, 06:55:59 AM

Previous topic - Next topic
UPDATE (6 Dec., 2024):

- Tried clearing the NDP tables in OPNsense and on the clients, as I was seeing a lot of 'Stale' entries for active IPs.  Broke the firewall states and forced me to reboot everything, but didn't fix the problem.  Clients still did not generate temp address after reboot.

- Replugged the clients into the switch ports, forcing a router solicitation (confirmed in Wireshark).  This worked and they all generated temporary IPs.

Still unclear what causes the IPs to stop generating and will continue to monitor.

UPDATE (10 Dec., 2024):

Waited for temporary address timers to count down again and confirmed the issue persists.  Captured additional debug.

======

Based on my searches I understand that clients (Windows, Linux, etc.) are responsible to regenerate their own SLAAC temporary IPv6 addresses after the configured 'preferred' or 'valid lifetime' has elapsed.  This is a host configuration that should be enabled and is usually set between 1-7 days.

I have 1x Windows 10 and 2x Linux Debian 12 clients all configured to regen temp IPs.  Initially this works and all the clients are showing temporary in addition to global and link-local IPv6 addresses.  They even seem to invalidate and regenerate automatically.  After a few days however, all the clients mysteriously lose their temporary IPs and fail to generate new ones.  Releasing and renewing the DHCP leases doesn't do anything, which is expected I think since I'm using SLAAC and not DHCPv6 (just trying anyway).

It's strange that all the 3 clients are showing this behavior at the same time.  I am thinking either there is some dependency on Router Advertisements / OPNsense, or my understanding is incomplete.

OPNsense configuration:

- WAN DHCPv4 and DHCPv6 with /60 prefix delegation
- 5 VLANs with static DHCPv4, IPv6 'Track Interface' with unique prefix IDs and 'Allow manual adjustment of RAs'
- ISC DHCPv4 service enabled
- ISC DHCPv6 service disabled
- Router Advertisements, all VLANs -  Unmanaged (A flag).


Windows 10 client:
> netsh interface ipv6 show privacy
Querying active state...

Temporary Address Parameters
---------------------------------------------
Use Temporary Addresses             : enabled
Duplicate Address Detection Attempts: 3
Maximum Valid Lifetime              : 1d
Maximum Preferred Lifetime          : 1d
Regenerate Time                     : 5s
Maximum Random Time                 : 10m
Random Time                         : 4m14s

Linux clients:
$ nmcli connection show "Wired connection 1" | grep ipv6
ipv6.method:                            auto
ipv6.dns:                               --
ipv6.dns-search:                        --
ipv6.dns-options:                       --
ipv6.dns-priority:                      0
ipv6.addresses:                         --
ipv6.gateway:                           --
ipv6.routes:                            --
ipv6.route-metric:                      -1
ipv6.route-table:                       0 (unspec)
ipv6.routing-rules:                     --
ipv6.replace-local-rule:                -1 (default)
ipv6.ignore-auto-routes:                no
ipv6.ignore-auto-dns:                   no
ipv6.never-default:                     no
ipv6.may-fail:                          yes
ipv6.required-timeout:                  -1 (default)
ipv6.ip6-privacy:                       2 (enabled, prefer temporary IP)      <------ HERE ------
ipv6.addr-gen-mode:                     stable-privacy      <------  HERE ------
ipv6.ra-timeout:                        0 (default)
ipv6.mtu:                               auto
ipv6.dhcp-duid:                         --
ipv6.dhcp-iaid:                         --
ipv6.dhcp-timeout:                      0 (default)
ipv6.dhcp-send-hostname:                yes
ipv6.dhcp-hostname:                     --
ipv6.dhcp-hostname-flags:               0x0 (none)
ipv6.auto-route-ext-gw:                 -1 (default)
ipv6.token:                             --

I don't currently have screenshots to prove that temporary addresses were previously active on Windows, but I can attest.

The current state is that the temp addresses are either expired (Linux) or disappeared entirely (Windows).

>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : BLACKBOX
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : home.arpa

Ethernet adapter Ethernet 2:

   Connection-specific DNS Suffix  . : home.arpa
   Description . . . . . . . . . . . : Realtek PCIe 2.5GbE Family Controller
   Physical Address. . . . . . . . . : xx-xx-xx-12-5A-xx
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   IPv6 Address. . . . . . . . . . . : 26xx:xx:xxxx:xxx5:3147:9377:xxx:xxxx(Preferred)
   Link-local IPv6 Address . . . . . : fe80::f93c:a1b3:5a5b:1e03%13(Preferred)
   IPv4 Address. . . . . . . . . . . : 192.168.50.100(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : Monday, December 2, 2024 7:07:13 PM
   Lease Expires . . . . . . . . . . : Wednesday, December 4, 2024 2:07:13 AM
   Default Gateway . . . . . . . . . : fe80::xxxx:xxxx:xxxx:c2e%13
                                       192.168.50.1
   DHCP Server . . . . . . . . . . . : 192.168.50.1
   DHCPv6 IAID . . . . . . . . . . . : xxxx164xx
   DHCPv6 Client DUID. . . . . . . . : xx-xx-xx-xx-xx-xx-3C-91-78-2D-xx-xx-xx-xx
   DNS Servers . . . . . . . . . . . : 192.168.50.1
                                       fd83:cc80:4fc3::1
   NetBIOS over Tcpip. . . . . . . . : Enabled
   Connection-specific DNS Suffix Search List :
                                       home.arpa

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether xx:xx:xx:d0:9a:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.40.100/24 brd 192.168.40.255 scope global dynamic noprefixroute eth0
       valid_lft 4054sec preferred_lft 4054sec
    inet6 26xx:xx:xxxx:xxx4:bb4c:2c5e:d39:6125/64 scope global temporary deprecated dynamic
       valid_lft 85993sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:bd85:3fe6:6d2e:7f9b/64 scope global temporary deprecated dynamic
       valid_lft 85993sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:f932:c89:dd5d:6a53/64 scope global temporary deprecated dynamic
       valid_lft 85993sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:4f6e:xxxx:xxxx:xxx/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 85993sec preferred_lft 13993sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether xx:xx:xx:85:cf:xx brd ff:ff:ff:ff:ff:ff

Is there a misconfiguration here, or some bug?  Thanks!


For me, this works. Here is one take from 18:05:


2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP group default qlen 1000
    link/ether d6:35:77:88:44:44 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.3/24 metric 100 brd 192.168.10.255 scope global dynamic eth0
       valid_lft 14565sec preferred_lft 14565sec
    inet6 2001:a61:52b:6010:bf2c:8dde:d566:9ad2/64 scope global temporary dynamic
       valid_lft 72906sec preferred_lft 658sec
    inet6 2001:a61:52b:6010:d435:77ff:fe88:2299/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86079sec preferred_lft 14079sec
    inet6 fe80::d435:77ff:fe88:4444/64 scope link
       valid_lft forever preferred_lft forever


And one from 22:52:


2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP group default qlen 1000
    link/ether d6:35:77:88:44:44 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.3/24 metric 100 brd 192.168.10.255 scope global dynamic eth0
       valid_lft 26160sec preferred_lft 26160sec
    inet6 2001:a61:52b:6010:c603:a3ce:b9d2:c37f/64 scope global temporary dynamic
       valid_lft 84003sec preferred_lft 11755sec
    inet6 2001:a61:52b:6010:33b5:69f2:2232:4b83/64 scope global temporary deprecated dynamic
       valid_lft 69851sec preferred_lft 0sec
    inet6 2001:a61:52b:6010:bf2c:8dde:d566:9ad2/64 scope global temporary deprecated dynamic
       valid_lft 55701sec preferred_lft 0sec
    inet6 2001:a61:52b:6010:d435:77ff:fe88:2299/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86375sec preferred_lft 14375sec
    inet6 fe80::d435:77ff:fe88:4444/64 scope link
       valid_lft forever preferred_lft forever


The valid lifetime is 86400s, the preferred lifetime is shorter. I use SLAAC only with Minimum Interval = 200 and Maximum Interval = 600.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

The issue persists; still no new temporary address has been generated on the clients which have not been rebooted yet.

I traced an RA in Wireshark and am seeing the expected values: 
- it was sent as multicast to the 'all-nodes' address ff02::1
- it has the Unmanaged mode ('M' and 'O' flags both unset)
- IPv6 prefix is correct based on the WAN received prefix from the ISP
- DNS RA options are correct as I have them configured in OPNsense (Ubnound is listening on a ULA address via loopback device)

I captured a short neighbor discovery conversation following the RA, in which my router and Windows client exchanged information.  Those IPs and MACs look correct.

So SLAAC itself seems OK, I think.

Is there anything else that the clients would need from OPNsense in order to manage their temporary addresses, or is that process entirely at the client OS level?  Again, curious that all my clients are doing this.


Reading through RFC 8981 currently, which supersedes RFC 4941 for temporary address extensions for SLAAC.  Section 3.4 states:

Quote
3.4. Generating Temporary Addresses

[RFC4862] describes the steps for generating a link-local address when an interface becomes enabled, as well as the steps for generating addresses for other scopes. This document extends [RFC4862] as follows. When processing a Router Advertisement with a Prefix Information option carrying a prefix for the purposes of address autoconfiguration (i.e., the A bit is set), the host MUST perform the following steps:

...


So yes, answering my own question, the temporary address generation is a host responsibility and this RFC specifies the algorithm.

But it also says that the 'A' flag must be present in the RA.  I don't see that in my wireshark capture.  I see 'M' and 'O' (both are unset). 

Is there an explicit 'A' flag that is missing in my RAs, or does the combination of M+O both being unset imply A?

I don't see any mention of 'A' in the RA Message Format spec, either.

Aha!  Found it.  I needed to expand the RA further.  It's embedded in the Options for prefix information.

Now that I talked myself through the router responsibility, I'm at a total loss for why both Windows and Linux are misbehaving.

Friends, I've been looking since my initial post and I fail to find the cause.  The issue is frustrating because I feel somewhat exposed continuing to use IPv6 without functioning privacy extensions and would really appreciate help to fix it.  If the issue is with OPNsense I am happy to raise a bug, but I don't know if I have enough evidence.

Mods, please feel free to move this to the 24.7 Series forum if appropriate.

To summarize everything until now:

- OPNsense 24.7.9_1 configured for SLAAC with Unmanaged RAs ('A' flag) and DNS RA option.
- Wireshark capture is showing periodic RAs being broadcast on VLAN subnets.
- On first connection to switch, clients are sending Router Solicitation and generating the initial IPv6 temporary address with the correct /64 prefix from the RA.
- The temporary addresses begin their countdown

(Until here everything appears fine.  Now the issue begins.)

- Once the lifetime reaches 0, the current temporary address is marked as 'deprecated' on each client.
- A new temporary address is never generated.  At this point all my internet browsing silently switches to the stable global address.
- After some time, Windows drops the deprecated address entirely, as if it never existed.  Linux continues to show it as 'deprecated' ad-infinitum, but IP address tests confirm that the global address is the one being used.

This is happening across various clients.  I have a Windows and Linux machine on one VLAN together, both showing the symptom.  I have another Linux machine on a different VLAN also showing the symptom.

To demonstrate the countdown and the new IP not being generated, I took periodic samples from the Windows box over about a ~1 day span, starting a few hours after I had gotten the initial temporary address.  You may ignore the 'Internet 3' connection (that is a Host-Only virtual Ethernet adapter installed by VirtualBox).


PS C:\Windows\system32> netsh interface ipv6 show addresses

Interface 1: Loopback Pseudo-Interface 1

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite ::1

Interface 14: Ethernet

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred    23h52m57s   3h52m57s 2601:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Temporary  Preferred    20h39m41s   3h52m57s 2601:xx:xxxx:xxx3:40b5:b951:9f28:4364
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

Interface 16: Ethernet 3

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite fe80::f49a:9538:17f8:8ecb%16

PS C:\Windows\system32> Get-Date

Sunday, December 8, 2024 4:03:12 AM

PS C:\Windows\system32> netsh interface ipv6 show addresses

Interface 1: Loopback Pseudo-Interface 1

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite ::1

Interface 14: Ethernet

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred     23h59m8s    3h59m8s 2601:xx:xxxx:xxxx3:xxxx:xxxx:xxxx:bb57
Temporary  Preferred    13h18m26s    3h59m8s 2601:xx:xxxx:xxx3:40b5:b951:9f28:4364
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

Interface 16: Ethernet 3

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite fe80::f49a:9538:17f8:8ecb%16

PS C:\Windows\system32> Get-Date

Sunday, December 8, 2024 11:23:59 AM

PS C:\Windows\system32> netsh interface ipv6 show addresses

Interface 1: Loopback Pseudo-Interface 1

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite ::1

Interface 14: Ethernet

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred     23h59m6s    3h59m6s 2601:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Temporary  Preferred     5h46m28s    3h59m6s 2601:xx:xxxx:xxx3:40b5:b951:9f28:4364
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

Interface 16: Ethernet 3

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite fe80::f49a:9538:17f8:8ecb%16

PS C:\Windows\system32> Get-Date

Sunday, December 8, 2024 6:55:46 PM

PS C:\Windows\system32> netsh interface ipv6 show addresses

Interface 1: Loopback Pseudo-Interface 1

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite ::1

Interface 14: Ethernet

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred    23h51m36s   3h51m36s 2601:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Temporary  Preferred        53m6s     48m51s 2601:xx:xxxx:xxx3:40b5:b951:9f28:4364
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

Interface 16: Ethernet 3

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite fe80::f49a:9538:17f8:8ecb%16

PS C:\Windows\system32> Get-Date

Sunday, December 8, 2024 11:49:27 PM

PS C:\Windows\system32> netsh interface ipv6 show addresses

Interface 1: Loopback Pseudo-Interface 1

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite ::1

Interface 14: Ethernet

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred     23h54m9s    3h54m9s 2601:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

Interface 16: Ethernet 3

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite fe80::f49a:9538:17f8:8ecb%16

PS C:\Windows\system32> Get-Date

Monday, December 9, 2024 1:18:21 AM


Here is one of the Linux boxes in the fresh state with a new temporary address, and the eventual bad state with a stale deprecated address:

~ $ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:xx:xx:xx:xx:5a brd ff:ff:ff:ff:ff:ff
    inet 192.168.40.100/24 brd 192.168.40.255 scope global dynamic noprefixroute eth0
       valid_lft 6492sec preferred_lft 6492sec
    inet6 2601:xx:xxxx:xxx4:2af6:3881:8b30:ecb0/64 scope global temporary dynamic
       valid_lft 86266sec preferred_lft 14266sec
    inet6 2601:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86266sec preferred_lft 14266sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether b8:xx:xx:xx:xx:0f brd ff:ff:ff:ff:ff:ff

~ $ date
Sun  8 Dec 18:56:30 EST 2024

~ $ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:xx:xx:xx:xx:5a brd ff:ff:ff:ff:ff:ff
    inet 192.168.40.100/24 brd 192.168.40.255 scope global dynamic noprefixroute eth0
       valid_lft 5163sec preferred_lft 5163sec
    inet6 2601:xx:xxxx:xxx4:2af6:3881:8b30:ecb0/64 scope global temporary deprecated dynamic
       valid_lft 85889sec preferred_lft 0sec
    inet6 2601:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 85889sec preferred_lft 13889sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether b8:xx:xx:xx:xx:0f brd ff:ff:ff:ff:ff:ff

~ $ date
Mon  9 Dec 01:18:35 EST 2024

This is confirming that the non-temporary global address is being used for external connections after this point:

~ $ curl ipecho.net/plain
2601:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3

Here is an example Router Solicitation from Wireshark (a different client):

Frame 427: 62 bytes on wire (496 bits), 62 bytes captured (496 bits) on interface \Device\NPF_{C28F3796-A982-4278-8729-D465F67F2756}, id 0
Ethernet II, Src: PCSSystemtec_5a:f1:58 (08:xx:xx:xx:xx:58), Dst: IPv6mcast_02 (33:33:00:00:00:02)
Internet Protocol Version 6, Src: fe80::7472:732e:303f:36b2, Dst: ff02::2
Internet Control Message Protocol v6
    Type: Router Solicitation (133)
    Code: 0
    Checksum: 0x2ea5 [correct]
    [Checksum Status: Good]
    Reserved: 00000000

And a sample Router Advertisement:

Frame 2457: 166 bytes on wire (1328 bits), 166 bytes captured (1328 bits) on interface \Device\NPF_{C28F3796-A982-4278-8729-D465F67F2756}, id 0
Ethernet II, Src: Protectli_f:0c:2e (64:xx:xx:xx:xx:2e), Dst: IPv6mcast_01 (33:33:00:00:00:01)
Internet Protocol Version 6, Src: fe80::xxxx:xxxx:xxxx:c2e, Dst: ff02::1
Internet Control Message Protocol v6
    Type: Router Advertisement (134)
    Code: 0
    Checksum: 0xe1ed [correct]
    [Checksum Status: Good]
    Cur hop limit: 64
    Flags: 0x00, Prf (Default Router Preference): Medium
        0... .... = Managed address configuration: Not set
        .0.. .... = Other configuration: Not set
        ..0. .... = Home Agent: Not set
        ...0 0... = Prf (Default Router Preference): Medium (0)
        .... .0.. = ND Proxy: Not set
        .... ..00 = Reserved: 0
    Router lifetime (s): 1800
    Reachable time (ms): 0
    Retrans timer (ms): 0
    ICMPv6 Option (Prefix information : 2601:xx:xxxx:xxx3::/64)
        Type: Prefix information (3)
        Length: 4 (32 bytes)
        Prefix Length: 64
        Flag: 0xc0, On-link flag(L), Autonomous address-configuration flag(A)
            1... .... = On-link flag(L): Set
            .1.. .... = Autonomous address-configuration flag(A): Set
            ..0. .... = Router address flag(R): Not set
            ...0 0000 = Reserved: 0
        Valid Lifetime: 86400 (1 day)
        Preferred Lifetime: 14400 (4 hours)
        Reserved
        Prefix: 2601:xx:xxxx:xxx3::
    ICMPv6 Option (Recursive DNS Server fd83:cc80:4fc3::1)
        Type: Recursive DNS Server (25)
        Length: 3 (24 bytes)
        Reserved
        Lifetime: 600 (10 minutes)
        Recursive DNS Servers: fd83:cc80:4fc3::1
    ICMPv6 Option (DNS Search List Option home.arpa)
        Type: DNS Search List Option (31)
        Length: 3 (24 bytes)
        Reserved
        Lifetime: 600 (10 minutes)
        Domain Names: home.arpa
        Padding
    ICMPv6 Option (MTU : 1500)
        Type: MTU (5)
        Length: 1 (8 bytes)
        Reserved
        MTU: 1500
    ICMPv6 Option (Source link-layer address : 64:xx:xx:xx:xx:2e)
        Type: Source link-layer address (1)
        Length: 1 (8 bytes)
        Link-layer address: Protectli_f:0c:2e (64:xx:xx:xx:xx:2e)

I see periodic RA's on the wire, at least every ~30 min. 

I don't see periodic RS's from these affected clients.  They send an RS when the connection is established, and then appear to go silent.  I believe this is normal but I'm not sure.

In between there are numerous Neighbor Solicitations and Neighbor Advertisements, between the router and the various clients.

Attaching screenshots of OPNsense configs.  Note that I'm not actually using DHCPv6 locally (it's disabled) but I am telling the RA to retrieve the DNS IPv6 ULA address from there (configuration only).

Kindly let me know what additional info would be helpful.

(split post because too many attachments)

As I said - works for me, so:

If SLAAC / IPv6 privacy does not work, you should ask yourself, what factors can contribute to that and what special circumstances you have that cause the problem. Obviously, it works the first time around, so what stops working and when?

1. Is it the RA not being sent out or not being received? Who is at fault here?
You could reboot your clients when the problem starts to show, to see if that fixes the problem. If so, OpnSense obviously does the "same thing" even after some time has passed (see also #4). If not: does RADVD still run?

2. So you have any components active that can cause problems (IPS, Suricata, Zenarmor)? Disable them.

3. You have some special things going on, one is a VLAN. Do you mix tagged and untagged traffic on the same interface? Is your switching hardware known to handle VLANs well? Also, I see a DNS server with fd83:: prefix. Do you use ULA? This does not show in your client dumps. There could also be another router in your network, this is why I use "high" priority for OpnSense RAs.

4. Does your ISP change IPv6 prefixes often / is the lifetime too short (i.e. shorter than 86400s)? On Linux, you can change the max and preferred lifetimes via "sysctl net.ipv6.conf.*.temp_valid_lft/temp_prefered_lft" to shorter values to see if the problem persists with shorter intervals. This is surely possible for Windows as well. It is also easier to debug what packets are exchanged.

5. Where do your clients live? Are those physical clients or VMs, which could live behind a virtualization firewall? Are they on LAN or WiFi?

6. What is the content of /var/etc/radvd.conf? Has it changed after the problem occurs? This would be the case if your ISP changes the prefixes for your WAN. You DHCPv6 client would then have to restart radvd with the new IA_PD prefix.

P.S.: It seems normal that the clients do not issue a new RS when they set a new temporary IPv6. Mine do not do that and they still get new temporary IPs. With shorter lifetimes, I can even see them disappear after the valid_lft has run out.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Lots of information here! Great to see this in one place. RFC8981 3.4.5 says
A temporary address is created only if this calculated preferred lifetime is greater than REGEN_ADVANCE time units. In particular, an implementation MUST NOT create a temporary address with a zero preferred lifetime.
I assume that's the case in your setup, but can you check?

Furthermore, from the first and second "ip a" output you can see that

  • valid_lft 86266sec preferred_lft 14266sec
  • valid_lft 85889sec preferred_lft 0sec
14266 sec of preferred lifetime were count down within 377 sec of valid lifetime. Are these clocks moving at relativistic speeds?

Thank you @meyergru and @mooh for your suggestions.  Let me address what I can.

Quote from: meyergru on December 10, 2024, 11:50:33 AM
1. Is it the RA not being sent out or not being received? Who is at fault here?
You could reboot your clients when the problem starts to show, to see if that fixes the problem. If so, OpnSense obviously does the "same thing" even after some time has passed (see also #4). If not: does RADVD still run?

They are being sent and received.  I replugged the Windows box into the switch and Wireshark shows a solicitation followed by an advertisement (screenshot attached).

Similarly, the 'radvdump' utility in Linux shows the received RAs:


~ $ sudo radvdump
#
# radvd configuration generated by radvdump 2.19
# based on Router Advertisement from fe80::xxxx:xxxx:xxxx:c2e
# received by interface eth0
#

interface eth0
{
        AdvSendAdvert on;
        # Note: {Min,Max}RtrAdvInterval cannot be obtained with radvdump
        AdvManagedFlag off;
        AdvOtherConfigFlag off;
        AdvReachableTime 0;
        AdvRetransTimer 0;
        AdvCurHopLimit 64;
        AdvDefaultLifetime 1800;
        AdvHomeAgentFlag off;
        AdvDefaultPreference medium;
        AdvLinkMTU 1500;
        AdvSourceLLAddress on;

        prefix 2601:xx:xxxx:xxx4::/64
        {
                AdvValidLifetime 86400;
                AdvPreferredLifetime 14400;
                AdvOnLink on;
                AdvAutonomous on;
                AdvRouterAddr off;
        }; # End of prefix definition


        RDNSS fd83:cc80:4fc3::1
        {
                AdvRDNSSLifetime 600;
        }; # End of RDNSS definition


        DNSSL home.arpa
        {
                AdvDNSSLLifetime 600;
        }; # End of DNSSL definition

}; # End of interface definition


Yes, rebooting the clients this time did reset the tempory IP.  Not sure why rebooting OPNsense the other day did not do it.

Quote
2. So you have any components active that can cause problems (IPS, Suricata, Zenarmor)? Disable them.

Nothing of the sort.  The only filtering is from 'pf' and DNS block lists.

Quote
3. You have some special things going on, one is a VLAN. Do you mix tagged and untagged traffic on the same interface? Is your switching hardware known to handle VLANs well?

Not mixing; all VLANs are tagged on parent interface 'igc0'.  'LAN' is itself a VLAN (id 1) on the same interface.

I'm using a UniFi switch.  Port 16 is configured as tags only, for the OPNsense trunk (screenshot attached).

Note: the 'VPN' connection is not set up yet; only the VLAN interface is defined for now as a placeholder.  So there is nothing there to cause routing problems.

Quote
Also, I see a DNS server with fd83:: prefix. Do you use ULA? This does not show in your client dumps. There could also be another router in your network, this is why I use "high" priority for OpnSense RAs.

Yes, this is the address I am giving to local IPv6 clients for Unbound.

The switch is L3 capable, but I have not configured any routing on it.  I am using it as an L2 managed switch for 'router-on-a-stick'.  I will try raising the OPNsense router priority anyway.

Quote
4. Does your ISP change IPv6 prefixes often / is the lifetime too short (i.e. shorter than 86400s)? On Linux, you can change the max and preferred lifetimes via "sysctl net.ipv6.conf.*.temp_valid_lft/temp_prefered_lft" to shorter values to see if the problem persists with shorter intervals. This is surely possible for Windows as well. It is also easier to debug what packets are exchanged.

No, the ISP prefix has not changed in all this time.  I don't think it is permanent, but it appears to be sticky even surviving across router reboots.  Maybe an ISP gateway reset would cause it to change, or some ISP-defined timer which is not published / known to me.  I will have to test it, but am assuming it's not relevant here since the prefix is not changed during my tests.

Will try lowering the values in linux.  In Windows the preferred lifetime is already set to 1 day (down from the default 7).

Quote
5. Where do your clients live? Are those physical clients or VMs, which could live behind a virtualization firewall? Are they on LAN or WiFi?

The machines in question are physical.

Windows is a desktop, hard-wired to the switch on VLAN 30 (HOME).
Linux #1 is a Raspberry Pi, hard-wired to the switch on VLAN 40 (IOT).
Linux #2 is an Intel NUC, wireless on VLAN 30 (HOME).

Quote
6. What is the content of /var/etc/radvd.conf?


root@firewall:~ # cat /var/etc/radvd.conf
# Automatically generated, do not edit
# Generated RADVD config for manual assignment on lan
interface vlan0.1 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx1::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt2
interface vlan0.20 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx2::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt3
interface vlan0.30 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx3::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt4
interface vlan0.40 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx4::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt5
interface vlan0.50 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx5::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};


Quote
Has it changed after the problem occurs? This would be the case if your ISP changes the prefixes for your WAN. You DHCPv6 client would then have to restart radvd with the new IA_PD prefix.

I don't know how to tell this, but I don't think so.  The ISP prefix hasn't changed, so that reason is off the table.

The modification time attribute on /var/etc/radvd.conf is:


root@firewall:~ # ls -l /var/etc/radvd.conf
-rw-r--r--  1 root wheel 1752 Dec  9 03:31 /var/etc/radvd.conf


This could be because I had rebooted the router a few times in the last couple days.

Quote
P.S.: It seems normal that the clients do not issue a new RS when they set a new temporary IPv6 [...]

Excellent!  Thank you for corroborating that observation.


Quote from: mooh on December 10, 2024, 12:30:51 PM
Lots of information here! Great to see this in one place. RFC8981 3.4.5 says
A temporary address is created only if this calculated preferred lifetime is greater than REGEN_ADVANCE time units. In particular, an implementation MUST NOT create a temporary address with a zero preferred lifetime.
I assume that's the case in your setup, but can you check?

I don't know how to check REGEN_ADVANCE.  I asked ChatGPT ( :-[) and it said that this is not a value that can be queried in Windows as it's part of a network protocol, not a system config.

The closest thing I see is this in Windows, which has a "Regenerate Time" value (5s):


>netsh interface ipv6 show privacy
Querying active state...

Temporary Address Parameters
---------------------------------------------
Use Temporary Addresses             : enabled
Duplicate Address Detection Attempts: 3
Maximum Valid Lifetime              : 1d
Maximum Preferred Lifetime          : 1d
Regenerate Time                     : 5s
Maximum Random Time                 : 10m
Random Time                         : 6m38s


There is a formula for this in RFC 8981 section 3.8:

Quote
REGEN_ADVANCE
    2 + (TEMP_IDGEN_RETRIES * DupAddrDetectTransmits * RetransTimer / 1000)

'RetransTimer', as best as I can tell, comes from the Router Advertisement.  I can check that in Wireshark-- and OPNsense is giving this value as '0':


Internet Control Message Protocol v6
    Type: Router Advertisement (134)
    Code: 0
    Checksum: 0xe1ed [correct]
    [Checksum Status: Good]
    Cur hop limit: 64
    Flags: 0x00, Prf (Default Router Preference): Medium
    Router lifetime (s): 1800
    Reachable time (ms): 0
    Retrans timer (ms): 0


Therefore, the formula can be reduced to just

2 + (0) = 2s

So I guess this it, in theory?

Quote
Furthermore, from the first and second "ip a" output you can see that

  • valid_lft 86266sec preferred_lft 14266sec
  • valid_lft 85889sec preferred_lft 0sec
14266 sec of preferred lifetime were count down within 377 sec of valid lifetime. Are these clocks moving at relativistic speeds?

It's strange: in Windows these values count down, but in the Linux box they fluctuate up and down.  These are two consecutive calls:


~ $ ip a
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    [...]
    inet6 2601:xx:xxxx:xxx4:ce2a:ff29:486e:be04/64 scope global temporary dynamic
       valid_lft 86284sec preferred_lft 14284sec
    inet6 2601:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86284sec preferred_lft 14284sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever



~ $ ip a
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
   [...]
    inet6 2601:xx:xxxx:xxx4:ce2a:ff29:486e:be04/64 scope global temporary dynamic
       valid_lft 86315sec preferred_lft 14315sec
    inet6 2601:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86315sec preferred_lft 14315sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever


Both values had gone up in between calls, but in this example it's by the same amount (31s).  There's definitely something strange with these.

Quote from: OPNenthu on December 11, 2024, 07:16:58 AMYes, rebooting the clients this time did reset the tempory IP.  Not sure why rebooting OPNsense the other day did not do it.

That seems to answer the question, obviously OpnSense does nothing different after some time has passed, since the client behaviour changes when you reboot them.

This begs the question of lowering the lifetimes and seeing what exactly is the problem.

Quote from: OPNenthu on December 11, 2024, 07:16:58 AMroot@firewall:~ # cat /var/etc/radvd.conf
# Automatically generated, do not edit
# Generated RADVD config for manual assignment on lan
interface vlan0.1 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx1::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt2
interface vlan0.20 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx2::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt3
interface vlan0.30 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx3::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt4
interface vlan0.40 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx4::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};
# Generated RADVD config for manual assignment on opt5
interface vlan0.50 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2601:xx:xxxx:xxx5::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS fd83:cc80:4fc3::1 {
        };
        DNSSL home.arpa {
        };
};

Not quite the same as mine, but the difference should not matter (all "Stateless", "Unmanaged" and "Assisted" should work the same):

# Generated RADVD config for manual assignment on opt8
interface igc1_vlan5 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        AdvManagedFlag off;
        AdvOtherConfigFlag on;
        prefix 2001:xxxx:xxxx:xx05::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS 2001:xxxx:xxxx:xx05:yyyy:yyyy:yyyy:yyyy {
        };
        DNSSL dmz {
        };
};


Quote from: OPNenthu on December 11, 2024, 07:16:58 AM
QuoteP.S.: It seems normal that the clients do not issue a new RS when they set a new temporary IPv6 [...]

Excellent!  Thank you for corroborating that observation.

The clients normally just do the prolongation after the preferred lifetime has ended. They do not send another RS, but they probably should do NS to avoid a duplicate address (DAD). This is something you should be able to observe when you lower the lifetimes. I would guess that something here is the problem.

Quote from: OPNenthu on December 11, 2024, 07:16:58 AM>netsh interface ipv6 show privacy
Querying active state...

Temporary Address Parameters
---------------------------------------------
Use Temporary Addresses             : enabled
Duplicate Address Detection Attempts: 3
Maximum Valid Lifetime              : 1d
Maximum Preferred Lifetime          : 1d
Regenerate Time                     : 5s
Maximum Random Time                 : 10m
Random Time                         : 6m38s

There is a formula for this in RFC 8981 section 3.8:


Not the same here: Maximum valid lifetime is 7d instead of 1d (Windows 11 Pro). But still, I would focus on Linux, because it is easy to lower the lifetimes and try there.

Quote from: OPNenthu on December 11, 2024, 07:16:58 AMInternet Control Message Protocol v6
    Type: Router Advertisement (134)
    Code: 0
    Checksum: 0xe1ed [correct]
    [Checksum Status: Good]
    Cur hop limit: 64
    Flags: 0x00, Prf (Default Router Preference): Medium
    Router lifetime (s): 1800
    Reachable time (ms): 0
    Retrans timer (ms): 0

Therefore, the formula can be reduced to just

2 + (0) = 2s

So I guess this it, in theory?


I doubt that, because:


#tcpdump -i eth0 -X -vvvv -ttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
 00:00:00.000000 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) _gateway > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [managed, other stateful], pref high, router lifetime 1800s, reachable time 0ms, retrans timer 0ms


Those zeroes for "reachable" and "retrans" also only mean "not specified".


So, no VLANs or VMs interfering with ICMPv6.

I use Unifi equipment here, too, but I have no L3 switch that I am aware of. Theoretically, it is possible that this interferes (via RAs or filtering of ICMPv6 messsages). Alas, if you have a router-on-a-stick, you cannot try to connect something otherwise.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Quote from: meyergru on December 11, 2024, 11:41:05 AM
This begs the question of lowering the lifetimes and seeing what exactly is the problem.

[...]

The clients normally just do the prolongation after the preferred lifetime has ended. They do not send another RS, but they probably should do NS to avoid a duplicate address (DAD). This is something you should be able to observe when you lower the lifetimes. I would guess that something here is the problem.

As a test I set these parameters on the Raspberry Pi and rebooted:

net.ipv6.conf.eth0.temp_prefered_lft = 300
net.ipv6.conf.eth0.temp_valid_lft = 600

A new temporary was generated each time the 'preferd_lft' approached within 2 seconds of its limit:


~ $ ip -6 a
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xxx4:8cf0:c29:5515:9a90/64 scope global temporary tentative dynamic
       valid_lft 599sec preferred_lft 126sec
    inet6 26xx:xx:xxxx:xxx4:e143:9dbd:610:9f14/64 scope global temporary dynamic
       valid_lft 475sec preferred_lft 2sec
    inet6 26xx:xx:xxxx:xxx4:7ea4:91ba:b8f2:dc1f/64 scope global temporary deprecated dynamic
       valid_lft 352sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:4783:e515:122c:f00d/64 scope global temporary deprecated dynamic
       valid_lft 228sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:a6f0:2ca6:c170:ec53/64 scope global temporary deprecated dynamic
       valid_lft 103sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86371sec preferred_lft 14371sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever


I also saw corresponding NS / NA messages each time these were rotated in 'tcpdump'.

I will try the same on the Windows box next.

UPDATE:

Windows gives an error if I try to set the values below 1d, so that appears to be the minimum allowed.  I've reset them to the OS defaults (MaxPreferred=1d, MaxValid=7d), but that is all I can do.

I am wondering if the Linux devices are failing when the values are not set explicitly in the kernel.  In that case they are using the router defaults from the RAs, which are coming as 14400 (4h) and 86400 (1d) but this doesn't seem reliable. (?)

UPDATE 2:

Apparently the lifetimes are capped by the RA setting.  Windows is forcing it to 4h/1d even though I specified 1d/7d.


>netsh interface ipv6 show addresses

Interface 1: Loopback Pseudo-Interface 1

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Other      Preferred     infinite   infinite ::1

Interface 14: Ethernet

Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred     23h55m5s    3h55m5s 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Temporary  Preferred     23h55m5s    3h55m5s 26xx:xx:xxxx:xxx3:cd62:f9f5:ea82:540b
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14


Looking back at my original post, the same thing is true in Linux. 

So, the times seem dictated by the lower of the two settings between the OS and the router RA.

Question now is, why does it seem to work OK (at least in Linux) only when I explicitly set OS values lower than the RA values?

(Firstly, I was wrong with my observation yesterday.  Although the router 'preferred' and 'valid' lifetimes are being shown in the OS output, those are not the values being used by the privacy extensions.  It seems the RA values get extended as many times as needed until the OS-defined limit is reached.)

---

Yesterday I manually configured both the Windows and Raspberry Pi with 'preferred lifetime' as 1 day and 'valid lifetime' as 7 days.

Some time ago both machines rotated the temporary IPs.

[...]
Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred    23h57m22s   3h57m22s 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Temporary  Deprecated   23h57m22s         0s 26xx:xx:xxxx:xxx3:cd62:f9f5:ea82:540b
Temporary  Preferred    23h57m22s   3h57m22s 26xx:xx:xxxx:xxx3:f048:85af:858c:ca01
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:xx:xx:xx:xx:5a brd ff:ff:ff:ff:ff:ff
    inet 192.168.40.100/24 brd 192.168.40.255 scope global dynamic noprefixroute eth0
       valid_lft 6542sec preferred_lft 6542sec
    inet6 26xx:xx:xxxx:xxx4:4db5:64db:7bde:5a60/64 scope global temporary dynamic
       valid_lft 86382sec preferred_lft 14382sec
    inet6 26xx:xx:xxxx:xxx4:1d0f:9d8f:99c:c5c7/64 scope global temporary deprecated dynamic
       valid_lft 86382sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86382sec preferred_lft 14382sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

So the initial test is passed.  I will monitor this for a few more days.

The second Linux box (the Intel NUC) is using the kernel default values and has not been explicitly set by me.  Incidentally, these match what I am using now on the Raspberry Pi, corresponding to 7d and 1d respectively:

$ cat /proc/sys/net/ipv6/conf/default/temp_valid_lft
604800
$ cat /proc/sys/net/ipv6/conf/default/temp_prefered_lft
86400

Nevertheless, this machine never recovered with a new temporary IP since it failed some days ago.  I've left it running just to see what would happen.

$ uptime
 05:46:05 up 20 days, 17:48,  1 user,  load average: 0.00, 0.00, 0.00

3: wlxa842a105d67b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a8:xx:xx:xx:xx:7b brd ff:ff:ff:ff:ff:ff
    inet 192.168.30.107/24 brd 192.168.30.255 scope global dynamic noprefixroute wlxa842a105d67b
       valid_lft 5590sec preferred_lft 5590sec
    inet6 26xx:xx:xxxx:xxx3:81c5:523e:21b:bc61/64 scope global temporary deprecated dynamic
       valid_lft 86046sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xx3:2f30:c654:c58e:210e/64 scope global temporary deprecated dynamic
       valid_lft 41564sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:4f0b/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86046sec preferred_lft 14046sec
    inet6 fe80::c7:5d08:e1c0:cb9b/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

So, all machines are using the identical values now. Two of them were set explicitly, one of them is the OS default (as it was).  The one with defaults failed on the second rotation and remained stuck that way.

Let us now see if the others will also get stuck again.

After 48 hrs, the issue is reproduced on both clients under test.

client 1:
>netsh interface ipv6 show addresses
[...]
Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred    23h59m52s   3h59m52s 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Temporary  Deprecated   23h59m52s         0s 26xx:xx:xxxx:xxx3:cd62:f9f5:ea82:540b
Temporary  Deprecated   23h59m52s         0s 26xx:xx:xxxx:xxx3:f048:85af:858c:ca01
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

client 2:
~ $ ip -6 a
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xxx4:4db5:64db:7bde:5a60/64 scope global temporary deprecated dynamic
       valid_lft 86306sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:1d0f:9d8f:99c:c5c7/64 scope global temporary deprecated dynamic
       valid_lft 86306sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86306sec preferred_lft 14306sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

Additionally, the 3rd client remains in its long-standing deprecated state (just with one of the previously deprecated IPs dropped):

client 3:
$ ip -6 a
[...]
3: wlxa842a105d67b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xxx3:81c5:523e:21b:bc61/64 scope global temporary deprecated dynamic
       valid_lft 46957sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:4f0b/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86304sec preferred_lft 14304sec
    inet6 fe80::c7:5d08:e1c0:cb9b/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

So nothing is generating temporary IPs now, and it seems that max. 2 days is enough to get into this state.

I captured tcpdump and wireshark outputs while the temporaries were due to rotate.  The test topology is like this:

  vlan0.1 (igc0), subnet 192.168.1.0/24
    - router @ 192.168.1.1, ip6: fe80::xxxx:xxxx:xxxx:c2e
  vlan0.30 (igc0), subnet 192.168.30.0/24
    - client 1 (Windows)
    - client 3 (Linux, Intel NUC)
  vlan0.40 (igc0), subnet 192.168.40.0/24
    - client 2 (Linux, Raspberry Pi)

The IP addresses were due to rotate at 03:53 for Windows and 03:57 for the Raspberry Pi.

From the Router's Perspective

On vlan0.30, one RA was sent to the broadcast address some time before the IP rotation deadline on the connected clients.  Another one was sent specifically to client 3 after the rotation deadline.  An RA was not sent at 03:53, when it was expected.

root@firewall:~ # tcpdump -i vlan0.30 -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on vlan0.30, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:48:52.268397 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            [...]
2024-12-14 03:57:51.559567 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > fe80::c7:5d08:e1c0:cb9b: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s

On vlan0.40, it's a similar story.  I was expecting an RA at 03:57 as that is exactly when client 2's IP got deprecated, but one didn't get broadcast until 03:59.  Way too late.

root@firewall:~ # tcpdump -i vlan0.40 -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on vlan0.40, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:55:20.081298 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > fe80::1009:f06b:fa78:524e: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            [...]
2024-12-14 03:59:09.990328 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s

Client 1's Perspective

The wireshark capture from Windows shows something interesting. At 03:53 there were some Multicast Listener Report Message(s) sent from the client device, followed shortly after by Neighbor Discovery messages.  There was no Router Solicitation or RA during this time.

Apologies for the heavy redactions in the screenshot, but my privacy extensions aren't working so... ;-)



Client 2's Perspective

The received RAs coincide temporally with the RAs sent by the router on VLAN 40.  Again I see that a targeted RA to the client IP was too early, and a non-specific / broadcast RA was received too late.  Nothing came at the expected time of 03:57.

~ $ sudo tcpdump -i eth0 -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:55:20.085009 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > fe80::1009:f06b:fa78:524e: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            [...]
2024-12-14 03:59:09.994197 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s

Client 3's Perspective

This one is inconsequential as it's not under test, but here it is anyway.

~$ sudo tcpdump -i wlxa842a105d67b -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on wlxa842a105d67b, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:48:52.273698 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) _gateway > ip6-allnodes: [icmp6 s um ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref.  time 14400s
            [...]
2024-12-14 03:57:51.565148 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) _gateway > NUC7PJYH: [icmp6 sum o k] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref.  time 14400s
            [...]


So to summarize the observations until now:

1) RA's are happening periodically.
2) On the first day, the issue was not observed and the first set of temporary IPs got deprecated and replaced at the expected time.
3) On the second day, the issue was reproduced and the new batch of temporary addresses did not generate.
  - The Windows client did not send a Router Solicitation at the expected time as confirmed with Wireshark.
  - Linux clients may or may not have sent RS's.  Unfortunately my tcpdump filters excluded them, so I can't tell.
  - The router did not send RA's at the time of IP renewal.
4) After the renewal time passes, RA's resume being sent periodically.  The clients never recover unless their interfaces are reset or they are rebooted.


I think I need to concentrate on the Router Solicitations a bit and see what is happening there.  Are there other message types aside from RS/RA/ND pertinent to this process that I also need to check?