Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - OPNenthu

#1
It does seem related to that.  Thank you for the link.

I understand now that by setting the monitor IP to another host I am just masking the issue from dpinger / Health RRD, so not a viable work-around.  Unfortunately my WAN IPs are dynamic, so I cannot rely on setting the src/dest mask in shaping.  Hopefully the suggestion on GitHub for adding an option to exclude ICMP from the IP shaper rule will come to pass.

Side question: could this have any bearing on my earlier issue with unreliable IPv6 temporary address generation (it is still happening, btw)?  I know the shaping is for WAN traffic only and should not impact LAN-side, but I am not sure how the shaper works under the hood.
#2
24.7.11_2-amd64

The IPv6 gateway goes into "offline" status with high packet loss reported when uploading to the web (observed while running online speed tests). Once upload activity ceases the gateway gradually returns to online status.  Health graphs reflect the packet loss on WAN_DHCP6. The IPv4 gateway is not impacted.





Despite what OPNsense says, the packet loss is not real.  The gateway remains online and speed tests indicate 0% actual loss.  It appears to be a reporting issue with no real consequence as far as I can tell.

I found two necessary conditions for this:

- Traffic shaping must be in use; in my case I am exactly following the guide on fixing Bufferbloat with FQ_CoDel.  I have one download pipe fixed to 760 Mbit/s and one upload pipe at 21 Mbit/s.

- The 'Monitor IP' in the gateway configuration must be default (to ping the gateway itself).

If either of these is changed, e.g. disabling the shaping or setting a public DNS as the monitor IP, then the issue is not observed.

Only uploads cause the symptom.  I confirmed with "speedtest-cli --no-download" from a wireless client.  Doing the inverse test with "--no-upload" has zero impact.  I'm seeing exactly the same from my wired clients also when using e.g. speedtest.net or CloudFlare speed test.

It doesn't matter if the upload is over IPv4 or IPv6; both routes will cause the v6 gateway (only) to virtually go offline as the packet "loss" accumulates.

For now I've set the CloudFlare DNS as the gateway monitor IP to work around the issue.

#3
24.7, 24.10 Production Series / Re: Kea Lease issue
January 16, 2025, 08:51:34 AM
Quote from: routenull0 on January 01, 2025, 03:20:57 PM
Quote from: OPNenthu on December 10, 2024, 04:47:10 AMEventually Kea will run a cleanup process and should remove stale entries, after which you can disable it.

Which cleanup process?

Is it possible to manually run this process, as I have leases that have expired, but have not been removed.
Quote from: xpking on December 11, 2024, 03:59:07 PMThank you all helping this issue.
I found the way to clear the leases.

Just modify the file /var/db/kea/kea-leases4.csv or simply remove the file to clear the lease.

I think 'kea-lfc' is the periodic process that vacuums stale leases from the database, which may or may not be a memfile (.csv).

https://kea.readthedocs.io/en/kea-1.6.1/man/kea-lfc.8.html
https://kea.readthedocs.io/en/kea-1.6.1/arm/admin.html#supported-backends
#4
The wireless client is also generating temporaries reliably now...

$ ip -6 a
[...]
3: wlxa842a105d67b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xxx3:cde6:b967:ddde:2c42/64 scope global temporary dynamic
       valid_lft 86097sec preferred_lft 14097sec
    inet6 26xx:xx:xxxx:xxx3:685:919b:e58b:48fe/64 scope global temporary deprecated dynamic
       valid_lft 86097sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx3:471e:a816:3f83:e635/64 scope global temporary deprecated dynamic
       valid_lft 86097sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx3:8e68:c66c:b515:720c/64 scope global temporary deprecated dynamic
       valid_lft 86097sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:xx0b/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86097sec preferred_lft 14097sec
    inet6 fe80::c7:5d08:e1c0:cb9b/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

The multicast configurations remain a bit of an enigma to me, but I am marking the topic as solved for now.  Thank you!
#5
It seems that Multicast Listener Discovery (MLD) for IPv6 is tied together with IGMP Snooping for IPv4 in UniFi switches.  Per Ubiquiti's automated support tool:

QuoteUniFi GPT
Steps to Control Multicast Listener Discovery (MLD) on a UniFi Switch

To control Multicast Listener Discovery (MLD) on your UniFi switch, you can manage it through the IGMP Snooping settings, as MLD Snooping is typically integrated with IGMP Snooping in network switches. Here's how you can enable and configure these settings:

    [...]

    Select the Network/VLAN:
        Choose the network or VLAN for which you want to configure MLD/IGMP Snooping.

    Enable IGMP Snooping:
        Scroll down to the Multicast Management section.
        Enable IGMP Snooping. This setting will also enable MLD Snooping for IPv6 multicast traffic.

    Configure Querier:
        If necessary, configure a specific switch as the Querier. This ensures that there is a designated device to manage multicast group memberships.
[...]

By following these steps, you should be able to control MLD on your UniFi switch. If you need further assistance, please click here to contact support.

I found a couple other reports, such as this one, of Ubiquiti users not having functional IPv6 until these settings were enabled.

It's not clear to me why these functions are needed for a small home network with a single switch, however enabling them made a difference.  It could be a quirk of UniFi at the time of this writing that causes interference with IPv6 multicast messaging in the default (off) state.

What I need from this community now is recommendations for how to set up the OPNsense rules properly since enabling IGMP functions on the switch.  As I mentioned earlier, once I enabled these I started seeing IGMP packets on all my VLAN interfaces with src 0.0.0.1 and dest 224.0.0.1.  These were getting dropped by the "Default deny / state violation" rule.  I added a very narrow 'pass' rule for these to my VLAN interface group ("HomeSubnets"), but I suspect this is not the best practice.

You cannot view this attachment.

You cannot view this attachment.

Some clarifying questions:

1) I'm assuming that no specific IPv6 MLD rules are needed from me, as OPNsense already has default rules for ICMPv6 and everything needed for NDP and Privacy Extensions to work are already included?

2) For IPv4 IGMP, do I need to widen my pass rule?  Is it advisable to allow all IGMP traffic, or all IGMP with dest 224.0.0.0/24 instead of just 224.0.0.1/32? These are the only packets related to IGMP that I am seeing, so that's why I started with the narrow rule.

3) How do I allow devices connected to the Guest VLAN to participate in multicast traffic originating from that subnet only?  I have an employer-issued laptop on the Guest network that might break if I disallow multicast (not sure how their IT department has things set up, but don't want to risk it).  At the same time I don't want hosts on the Guest network to see multicast from my other (private) networks.  Is this the default behavior of multicast traffic without IGMP proxy or mDNS repeater?  Or do I need explicit rules to do this filtering?


As for the WiFi AP, I have disabled the IGMP functions on it entirely and rebooted the AP.  Now my WiFi client has gotten a fresh temporary IPv6 address.  Will monitor to see if it continues to refresh daily.
#6
Progress!

Enabling the aforementioned settings on the UniFi switch and creating the IGMP 'pass' rule in OPNsense seem to have resolved the issue on the clients which are directly connected to the switch.

Note the Windows client had been rebooted a couple days ago so has fewer deprecated temporaries than the linux client, but so far they are generating reliably.

~ $ ip -6 a
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xx83:6869:3745:906b:c26d/64 scope global temporary dynamic
       valid_lft 86388sec preferred_lft 14388sec
    inet6 26xx:xx:xxxx:xx83:8613:51e3:6d0a:e60a/64 scope global temporary deprecated dynamic
       valid_lft 86388sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xx83:ab4e:d1cb:2cf4:7862/64 scope global temporary deprecated dynamic
       valid_lft 86388sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xx83:8158:8786:aea:cde7/64 scope global temporary deprecated dynamic
       valid_lft 86388sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xx83:b779:f3de:d005:9f6c/64 scope global temporary deprecated dynamic
       valid_lft 86388sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xx83:f551:xxxx:xxxx:xx2f/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86388sec preferred_lft 14388sec
    inet6 fe80::ec8a:1bb1:304d:b712/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

>netsh interface ipv6 show addresses
[...]
Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred    23h59m57s   3h59m57s 26xx:xx:xxxx:xx83:2bbd:xxxx:xxxx:xxx57
Temporary  Deprecated   23h59m57s         0s 26xx:xx:xxxx:xx83:9d4b:705:a168:abd2
Temporary  Deprecated   23h59m57s         0s 26xx:xx:xxxx:xx83:c0fc:4705:e096:a9ef
Temporary  Preferred    23h59m57s   3h59m57s 26xx:xx:xxxx:xx83:c14d:7f8f:4b45:20ff
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

The wireless client is still getting stuck, so I think this is a clue that multicast is still not configured correctly on the AP:

~$ ip -6 a
[...]
3: wlxa842a105d67b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xx83:d62c:d808:1157:6926/64 scope global temporary deprecated dynamic
       valid_lft 86362sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xx83:952a:xxxx:xxxx:xx0b/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86362sec preferred_lft 14362sec
    inet6 fe80::c7:5d08:e1c0:cb9b/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

I had posted a screenshot in my last post with the relevant AP settings in FreshTomato.  IGMP Snooping is currently enabled there, and was previously disabled.  Neither setting has worked reliably so far for the WiFi connected linux client.  What is the technically correct way to configure multicast on an access point?
#7
One more strange thing observed after changing these settings is that my Chromecast devices on the IoT network started making DNS queries to "192.168.0.1".  I'm used to them constantly trying to reach 8.8.8.8, but this IP is new.  There is no 192.168.0.x network anywhere, so I am wondering if this is something the Chromecasts are doing internally (setting up their own network somehow?).



Anyway, they are firewalled to the IoT VLAN and I am catching and redirecting all DNS that isn't destined to the firewall, so the 'rdr' rules are just sending this traffic to Unbound.
#8
The clock skew was not the issue.  I'm glad for that because I realized after my last post that this would have been a vulnerability in the protocol if it were the case.

I prompted ChatGPT to act as a Ubiquiti support engineer and together we walked through my configuration and checked everything from routing and NDP tables, to host firewalls, to Multicast settings on the switch.  It also wanted me to check any ACLs on the switch, but there are none that I can see (no interface option in UniFi Network controller for ACLs).

I'll need a couple days to see if there is success, but I'll summarize the changes here.  Are these settings generally correct / appropriate, or have I opened up a security or performance hole?

In UniFi controller:
- Enabled "IoT Auto-Discovery / mDNS" for the IoT network only.
- Enabled "Multicast Filtering" and "IGMP Snooping" on all networks / VLANs.
- Left disabled "Forward Unknown Multicast Traffic"
- Enabled "Fast Leave" option (all networks).
- Set the multicast Querier as the UniFi switch itself (all networks).
- Configured the trunk port carrying all tagged VLANs to OPNsense as a "Multicast Router Port" in port profiles
- Disabled "DHCP Guarding" / "DHCP Snooping" functions temporarily

In FreshTomato (the firmware I used to convert my old Asus WiFi router into an access point):
  - Enabled "IGMP Snooping"
  - Enabled IGMPv2

In OPNsense:
  - Added a rule for IGMP packets on all VLAN interfaces, as these started appearing and were getting default denied
      Action: pass
      Interface: HomeSubnets (all VLANs group)
      Direction: In
      Protocol: IPv4 IGMP
      Src: 0.0.0.1/32
      Dest: 224.0.0.1/32


The idea behind these changes is to make sure that both the switch and the AP (with 4 internal bridge interfaces) are passing multicast traffic well, so that NDP would not be affected.  Even though much of this relates to IPv4 traffic, ChatGPT was convinced that it could have negative impacts on IPv6 as well.

I left everything pertaining to IGMP Proxy off, as I want OPNsense to manage the inter-VLAN routing of multicast.  For now I have not added any specific rules in OPNsense for these, so I'm expecting that the default ICMPv6 rules are enough for NDP functions.

I also enabled the built-in Windows 10 firewall rules for ICMPv6 Echo as they were disabled and not passing.  Though NDP shouldn't depend on this, it was causing the linux boxes to fail to discover Windows as a neighbor.


#9
Ah, had missed that.

Fixed now.  Thanks!
#10
Hi @franco, I updated from 24.7.9_1 to 24.7.11 a few hours ago and now Health graphs are coming up empty. Should I have to reset RRD data?

Thanks!
#11
If this was all working before then there's very likely a new issue here, but just in case-

I have seen multiple posts from senior members warning against mixing tagged and untagged traffic in FreeBSD.  Is this one of the symptoms of that?

This is one of the very first threads I came across when first setting up my home network: https://forum.opnsense.org/index.php?topic=36530.0

There's a warning in the OPNsense docs against mixing: https://docs.opnsense.org/manual/how-tos/vlan_and_lagg.html

To better understand the reasons and potential issues, I asked ChatGPT and am providing the conversation below for reference.

If this is indeed a common issue then I'm surprised OPNsense doesn't have a built-in warning in the application, as most people probably only go to the docs when they get stuck and are likely to miss it.  There's nothing that prevents people from setting things up like this and not realizing that there could be routing problems.

------

Prompt:
QuoteWe are discussing networking in FreeBSD, particularly firewall systems based on pf such as OPNsense and pfSense.

There are some opinions that tagged and untagged traffic should not be mixed on the same interface.  Why?  What is the technical explanation and what problems does it cause?

Answer by ChatGPT 4o mini, 17 December 2024:
Quote
In FreeBSD, particularly with firewall systems like pfSense and OPNsense, the issue of mixing tagged and untagged traffic on the same interface arises when dealing with VLANs (Virtual Local Area Networks) and standard Ethernet traffic.

Explanation of Tagged and Untagged Traffic

- Tagged traffic refers to Ethernet frames that include a VLAN tag (also known as a 802.1Q tag). This tag carries information about which VLAN the frame belongs to, allowing a network switch or other network devices to properly segregate traffic into different logical networks (VLANs) over the same physical interface.
- Untagged traffic, on the other hand, does not include any VLAN tag. It is treated as being part of the default VLAN (usually VLAN 1 on most switches), which is typically used for normal, unsegmented network traffic.

Why Should Tagged and Untagged Traffic Not Be Mixed?

The technical issue with mixing tagged and untagged traffic on the same interface involves how network devices process Ethernet frames. Here are some of the key reasons:

1. Ambiguity in Frame Processing
   - If both tagged and untagged traffic is sent through the same interface, it can cause ambiguity in how frames are handled. When an Ethernet frame arrives at the interface, the device needs to know if the frame should be processed as part of a specific VLAN (tagged) or as part of the default network (untagged). This confusion can lead to misrouting or mishandling of traffic.
   - For example, if the interface receives a frame without a VLAN tag (untagged) but is also configured to handle tagged traffic for several VLANs, it might not be clear which VLAN the untagged frame should belong to, potentially causing traffic to be misdirected.

2. Security Risks and Traffic Leakage
   - Mixing tagged and untagged traffic on the same interface can lead to security vulnerabilities. If traffic that is not meant to be part of a VLAN is accidentally tagged or handled as part of another VLAN, sensitive traffic could be exposed to devices that should not have access to it.
   - Additionally, malicious users or misconfigured devices might be able to inject tagged traffic into an untagged VLAN, leading to VLAN hopping, where traffic is inadvertently allowed to access networks it shouldn't.

3. Operational Complexity
   - Having both tagged and untagged traffic on the same interface complicates network administration. It can be more difficult to troubleshoot network issues because traffic might behave unpredictably. Misconfigurations are harder to detect when the network doesn't clearly separate VLAN-tagged and untagged traffic.
   - Network configurations such as DHCP (Dynamic Host Configuration Protocol) can also become more complicated. If an interface serves both tagged and untagged traffic, it may be difficult for DHCP servers to assign IP addresses properly to hosts in different VLANs, leading to IP address conflicts or other networking issues.

4. Performance Considerations
   - Mixing tagged and untagged traffic can also lead to inefficiencies in processing, especially in larger networks. The network stack must continuously check if traffic is tagged, and this overhead can become problematic, particularly in high-traffic environments. Furthermore, switches and routers may handle this situation less efficiently when performing VLAN tagging or untagging operations.

Specific Problems that Can Occur
   - Misrouting of Traffic: A frame may be incorrectly assigned to the wrong VLAN, leading to communication issues and potentially leaking traffic to unintended segments of the network.
   - Broadcast Traffic: Broadcast traffic, typically sent untagged, could be inadvertently forwarded to the wrong VLAN if the interface is improperly configured.
   - DHCP Problems: If a device requests an IP address on an untagged VLAN but the interface is processing multiple VLANs, the DHCP process may fail to assign the correct IP, as the VLAN tag information may not be properly applied or recognized.
   - VLAN Hopping: In some cases, a misconfigured interface could allow tagged traffic from one VLAN to enter another VLAN, bypassing segmentation and causing unauthorized access.

Best Practices
To avoid these issues, it is generally recommended to separate tagged and untagged traffic on different interfaces. This can be done by:
   - Using different physical interfaces for tagged and untagged traffic.
   - Using VLAN-aware interfaces where tagged traffic is explicitly separated from untagged traffic through proper VLAN configuration.
   - Ensuring that the network device's configuration clearly distinguishes between traffic that should be tagged and that which should remain untagged.

In summary, mixing tagged and untagged traffic on the same interface can lead to ambiguity, security risks, and operational complications, making it a practice that should generally be avoided in network design. Separating tagged and untagged traffic allows for clearer traffic handling, better security, and simpler network management.

#12
(Edit: Deleted post. Sorry, didn't realize this was an old thread.)
#13
General Discussion / Re: Worth enabling DNS over TLS
December 15, 2024, 03:35:20 AM
@chemlud I'd be interested to hear more r/e Quad9 as I'm rethinking my DNS strategy.
#14
Quote from: meyergru on December 14, 2024, 01:25:50 PM[...] so it must be something that keeps the clients from assuming their new IPs, like DAD.

As I was researching DAD to learn how to trace it, I got the idea to cross-check my Wireshark capture against the 'pf' logs in OPNsense:

2024-12-14 03:53:34.651149    fe80::xxxx:xxxx:xxxx:c2e    ff02::1:ff01:bb57    ICMPv6    86    Neighbor Solicitation for 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57 from 64:xx:xx:xx:xx:2e

2024-12-14T03:53:32-05:00    Informational    filterlog     36,,,acdbb900b50d8fb4ae21ddfdc609ecf8,vlan0.30,match,pass,out,6,0x00,0x00000,255,ipv6-icmp,58,32,fe80::xxxx:xxxx:xxxx:c2e,ff02::1:ff01:bb57,datalength=32

So apparently, I have a >2s time skew between my Windows box and OPNsense.  Confirmed by comparing the system dates.

I don't know if this would impact the linux clients as they don't seem to have the same skew, but I will be kicking myself very hard if this was the cause all along.  Will know in a couple days I guess.

Now the stumper question: who has the correct time?

- OPNsense is using its own NTP service with [0-3].opnsense.pool.ntp.org.
- Windows is set to 'time.nist.gov'

It's time for a break...



UPDATE:

It's DNS - I was getting SERFVFAIL for time.nist.gov (in fact, most *.gov domains).  Must be a DNSSEC issue.



#15
After 48 hrs, the issue is reproduced on both clients under test.

client 1:
>netsh interface ipv6 show addresses
[...]
Addr Type  DAD State   Valid Life Pref. Life Address
---------  ----------- ---------- ---------- ------------------------
Public     Preferred    23h59m52s   3h59m52s 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:bb57
Temporary  Deprecated   23h59m52s         0s 26xx:xx:xxxx:xxx3:cd62:f9f5:ea82:540b
Temporary  Deprecated   23h59m52s         0s 26xx:xx:xxxx:xxx3:f048:85af:858c:ca01
Other      Preferred     infinite   infinite fe80::d968:a93c:3f8a:521f%14

client 2:
~ $ ip -6 a
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xxx4:4db5:64db:7bde:5a60/64 scope global temporary deprecated dynamic
       valid_lft 86306sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:1d0f:9d8f:99c:c5c7/64 scope global temporary deprecated dynamic
       valid_lft 86306sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx4:xxxx:xxxx:xxxx:2c3/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86306sec preferred_lft 14306sec
    inet6 fe80::1009:f06b:fa78:524e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

Additionally, the 3rd client remains in its long-standing deprecated state (just with one of the previously deprecated IPs dropped):

client 3:
$ ip -6 a
[...]
3: wlxa842a105d67b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 26xx:xx:xxxx:xxx3:81c5:523e:21b:bc61/64 scope global temporary deprecated dynamic
       valid_lft 46957sec preferred_lft 0sec
    inet6 26xx:xx:xxxx:xxx3:xxxx:xxxx:xxxx:4f0b/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86304sec preferred_lft 14304sec
    inet6 fe80::c7:5d08:e1c0:cb9b/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

So nothing is generating temporary IPs now, and it seems that max. 2 days is enough to get into this state.

I captured tcpdump and wireshark outputs while the temporaries were due to rotate.  The test topology is like this:

  vlan0.1 (igc0), subnet 192.168.1.0/24
    - router @ 192.168.1.1, ip6: fe80::xxxx:xxxx:xxxx:c2e
  vlan0.30 (igc0), subnet 192.168.30.0/24
    - client 1 (Windows)
    - client 3 (Linux, Intel NUC)
  vlan0.40 (igc0), subnet 192.168.40.0/24
    - client 2 (Linux, Raspberry Pi)

The IP addresses were due to rotate at 03:53 for Windows and 03:57 for the Raspberry Pi.

From the Router's Perspective

On vlan0.30, one RA was sent to the broadcast address some time before the IP rotation deadline on the connected clients.  Another one was sent specifically to client 3 after the rotation deadline.  An RA was not sent at 03:53, when it was expected.

root@firewall:~ # tcpdump -i vlan0.30 -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on vlan0.30, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:48:52.268397 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            [...]
2024-12-14 03:57:51.559567 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > fe80::c7:5d08:e1c0:cb9b: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s

On vlan0.40, it's a similar story.  I was expecting an RA at 03:57 as that is exactly when client 2's IP got deprecated, but one didn't get broadcast until 03:59.  Way too late.

root@firewall:~ # tcpdump -i vlan0.40 -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on vlan0.40, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:55:20.081298 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > fe80::1009:f06b:fa78:524e: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            [...]
2024-12-14 03:59:09.990328 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s

Client 1's Perspective

The wireshark capture from Windows shows something interesting. At 03:53 there were some Multicast Listener Report Message(s) sent from the client device, followed shortly after by Neighbor Discovery messages.  There was no Router Solicitation or RA during this time.

Apologies for the heavy redactions in the screenshot, but my privacy extensions aren't working so... ;-)



Client 2's Perspective

The received RAs coincide temporally with the RAs sent by the router on VLAN 40.  Again I see that a targeted RA to the client IP was too early, and a non-specific / broadcast RA was received too late.  Nothing came at the expected time of 03:57.

~ $ sudo tcpdump -i eth0 -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:55:20.085009 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > fe80::1009:f06b:fa78:524e: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            [...]
2024-12-14 03:59:09.994197 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::xxxx:xxxx:xxxx:c2e > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx4::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s

Client 3's Perspective

This one is inconsequential as it's not under test, but here it is anyway.

~$ sudo tcpdump -i wlxa842a105d67b -X -vvv -tttt icmp6 and 'ip6[40] = 134'
tcpdump: listening on wlxa842a105d67b, link-type EN10MB (Ethernet), snapshot length 262144 bytes
[...]
2024-12-14 03:48:52.273698 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) _gateway > ip6-allnodes: [icmp6 s um ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref.  time 14400s
            [...]
2024-12-14 03:57:51.565148 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) _gateway > NUC7PJYH: [icmp6 sum o k] ICMP6, router advertisement, length 112
        hop limit 64, Flags [none], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 26xx:xx:xxxx:xxx3::/64, Flags [onlink, auto], valid time 86400s, pref.  time 14400s
            [...]


So to summarize the observations until now:

1) RA's are happening periodically.
2) On the first day, the issue was not observed and the first set of temporary IPs got deprecated and replaced at the expected time.
3) On the second day, the issue was reproduced and the new batch of temporary addresses did not generate.
  - The Windows client did not send a Router Solicitation at the expected time as confirmed with Wireshark.
  - Linux clients may or may not have sent RS's.  Unfortunately my tcpdump filters excluded them, so I can't tell.
  - The router did not send RA's at the time of IP renewal.
4) After the renewal time passes, RA's resume being sent periodically.  The clients never recover unless their interfaces are reset or they are rebooted.


I think I need to concentrate on the Router Solicitations a bit and see what is happening there.  Are there other message types aside from RS/RA/ND pertinent to this process that I also need to check?