Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - rmayr

#1
I'm going to try enabling IPv6 SLAAC for Android clients on one of our guest/lab WiFi segments at our university institute to see if I can reproduce the problems there as well. It has a complicated, but different OPNsense base setup, so we might learn something from the differences.
#2
Another small data point: "Disable interface scrub" does not seem to change anything when turned on.
#3
Quote from: franco on March 23, 2026, 01:53:37 PM
Quote from: Patrick M. Hausen on March 23, 2026, 12:33:30 AMI'm a bit puzzled by this.

Android trying to reach a GUA from a link-local source address is the level of competence we're dealing with here.


Cheers,
Franco

In my tcpdumps above, I replaced my GUA prefix as supplied by the ISP through PD with <myprefix>. This doesn't refer to the ULA range I use (which is also on the respective VLANs, but does not seem to interfere in the behaviour I am seeing).
#4
I am trying it with "sloppy" right now, but am not sure if the problem really is limited to only ICMP6. As before, It's (also) TCPv6 SYN packets that are sent (unicast) to the OPNsense LAN Ethernet MAC address with a target IPv6 address on the WAN side.

Same high-level behaviour still: After some time being connected on the WiFi, trying to reload https://test-ipv6.com simply times out with "No IPv6 address detected" and those TCP SYN packets never receiving a reply. I can see the SYN packets on the LAN side of OPNsense:
22:32:09.825963 fa:17:c7:f8:dd:85 > ca:00:00:00:64:01, ethertype IPv6 (0x86dd), length 94: <myprefix>:2d9:c3d6:1433:ee1.37430 > 2a01:7e03::f03c:94ff:fed0:11a6.443: Flags [S], seq 1969279674, win 65535, options [mss 1432,sackOK,TS val 2090357468 ecr 0,nop,wscale 8], length 0
22:32:10.379084 fa:17:c7:f8:dd:85 > ca:00:00:00:64:01, ethertype IPv6 (0x86dd), length 94: <myprefix>:2d9:c3d6:1433:ee1.22000 > 2a01:4f8:13b:1643::2.443: Flags [S], seq 1286341450, win 65535, options [mss 1432,sackOK,TS val 2535439579 ecr 0,nop,wscale 8], length 0


going out on WAN (pppoe0) and immediately being answered from the WAN side:

22:32:09.826029 AF IPv6 (28), length 84: <myprefix>:2d9:c3d6:1433:ee1.37430 > 2a01:7e03::f03c:94ff:fed0:11a6.443: Flags [S], seq 1969279674, win 65535, options [mss 1432,sackOK,TS val 2090357468 ecr 0,nop,wscale 8], length 0
22:32:09.996886 AF IPv6 (28), length 84: 2a01:7e03::f03c:94ff:fed0:11a6.443 > <myprefix>:2d9:c3d6:1433:ee1.37430: Flags [S.], seq 2081477611, ack 1969279675, win 58996, options [mss 8440,sackOK,TS val 3773941169 ecr 2090357468,nop,wscale 7], length 0
22:32:10.379114 AF IPv6 (28), length 84: <myprefix>:2d9:c3d6:1433:ee1.22000 > 2a01:4f8:13b:1643::2.443: Flags [S], seq 1286341450, win 65535, options [mss 1432,sackOK,TS val 2535439579 ecr 0,nop,wscale 8], length 0
22:32:10.398194 AF IPv6 (28), length 84: 2a01:4f8:13b:1643::2.443 > <myprefix>:2d9:c3d6:1433:ee1.22000: Flags [S.], seq 2319345901, ack 1286341451, win 65178, options [mss 1290,sackOK,TS val 782819494 ecr 2535404002,nop,wscale 7], length 0

But those replies never make it back to the LAN interface. A few seconds later, I get a block entry in the filter log, for a different packet than the one that was silently dropped, though:
WAN In 2026-03-22T22:32:44 TCP [2a01:4f8:13b:1643::2]:443 [<myprefix>:2d9:c3d6:1433:ee1]:52302 block Default deny / state violation rule

And that's what's really confusing me: None of that ever happens with other Linux clients on the same LAN, and if I reconnect the Android client to get a new SLAAC address, it immediately works again (for a short time). Which part of the pf state tracking logic goes out of whack here, and why only for Android clients?
#5
Quote from: franco on March 18, 2026, 07:29:21 AM
Quote from: franco on March 16, 2026, 12:43:10 PMOne way to "fix" this for educational purposes is to turn off state tracking for the ICMP rules for better or worse, but basically just going back to where the code was before all of this started and try statetype set to "sloppy" or "none" to see if this improves the behaviour:

https://github.com/opnsense/core/blob/5b07e0917484b90d0e9411c5e2c4f8ed5a07b8c7/src/etc/inc/filter.lib.inc#L242

We can consider making this configurable if it has a real world benefit.

Please don't forget to comment on this...

Not forgotten, it's on my TODO list to try it as soon as I find the next time slot for in-depth debugging :)
In the meantime, I have disabled all forms of IGMP/MLD snooping on switch(es) and access points that are in-path just to reduce other potential sources of errors (though for the network traces I posted before, RAs and NDs never seem to have been the problem).
#6
Quote from: glasi on March 16, 2026, 02:07:58 PM
Quote from: rmayr on March 16, 2026, 12:23:53 AMThis seems stable for my desktop devices, but Android devices, though they consistently get a SLAAC pair of addresses, fail to connect.
Are you sure it hasn't to do with Android's energy saving behaviour? Android >=15 ignores IPv6 if RA lifetime is below a certain threshold (e.g. 180 seconds).

Wow, now I feel slightly stupid in that I hadn't learned of this change beforehand. Indeed, setting RA lifetime from 120s to 300s makes my test Android devices set a default route again. One weird aspect is that the Android device picks up and sets its two random SLAAC addresses perfectly well with a lifetime of 120s. It just doesn't seem to set the default route.

(Context: I am using values much lower than the default 1800s because of ongoing debugging with IPv6 handover when the firewalls switch master/backup CARP roles.)

But now I'm back to the behavior of "it works for a few minutes, and then it breaks in an irregular manner, causing network connections from the Android devices to fail or become extremely slow". So, to keep my other members of the household happy, I'm intentionally setting the lifetimes back to 120s and therefore letting Android devices fall back to IPv4-only for the time being. I will continue debugging.

Thanks for the pointer! I learned something new today.
#7
Thanks, @franco for the additional context. I have to admit that wading through the FreeBSD bug report linked above was ... painful ... and does not inspire a lot of confidence in future upstream behavior.

While I could probably try to reproduce on a standard FreeBSD kernel (assuming I could just take the OPNsense generated radvd and pf configs and run them on stock FreeBSD), it would be a major time investment for something that has been bugging me for nearly a year but hasn't been a major personal blocker. And reading the bug report, it might be wasted time anyway if upstream doesn't seem to care much about IPv6 stability with pf. With my current home setup (assuming it doesn't turn up any other unexpected surprises), I can fully use IPv6 locally (ULA) and globally (PD assignment to OPNsense, identity association for VLAN interfaces, radvd with hacky master/backup script, kea) from my Linux desktops/laptops and locally run services. Android clients for now fall back to IPv4-only, which is not terrible and - importantly - is not currently noticed by other members of the household, but still bugs my sense of proper networking setup. If I get another long evening to dedicate to this, I might write up a small post on the setup and all the components coming together.

However, for our university student lab setup, this might become a major blocker for some lab scenarios - especially as the packet dropping is erratic and unpredictable and fairly impossible to debug for students. We'll have to think about some of those and how best to deal with IPv6 in teaching setups. We've been using OPNsense as our main example in student labs due to the great UI and nice integration of components (thanks to the whole team for all the work!) for years, and I'd hate to have to move some of them over to VyOS or OpenWRT. The students would hate the change even more, I'm sure :)
#8
26.1 Series / IPv6 from Android devices still broken
March 16, 2026, 12:23:53 AM
Unfortunately, https://forum.opnsense.org/index.php?msg=262791 seems to apply to 26.1 as much as to the 25 series. I have now tried to switch from dnsmasq (which has another bug with SAs, to be reported separately) back to radvd and KEA for DHCPv6, plugging in a carp.d hook script to only let radvd run on the respective master.

This seems stable for my desktop devices, but Android devices, though they consistently get a SLAAC pair of addresses, fail to connect.

Is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280701 supposed to apply to 26.1 kernels? Has the whole patch set been reverted or is it supposed to be fixed?
#9
After finding and reading through https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280701, I believe that is pretty much what I am still seeing in OPNsense 25 and 26 releases (state violation leading to traffic from Android devices being dropped after a few minutes, while a pf reload or Android device re-connecting fixes the connection until it drops again a few minutes later).

@franco - was the pf patch series reverted in the OPNsense kernel in 24 and then re-introduced in 25 or 26?
#10
No changes. In response to packet like these that I see from the Android device with
tcpdump -n -i igb2_vlan64 ether host fa:17:c7:f8:dd:85 and ip6 (that is the Android device randomized MAC address for this SSID):

18:00:19.946419 IP6 2a03:fa00:650:30:9a7c:9494:3859:2d9b.45934 > 2606:4700:10::6814:2f59.443: Flags [S], seq 3003767200, win 65535, options [mss 1432,sackOK,TS val 3205487265 ecr 0,nop,wscale 8], length 0
So a simple TCP start of connection (SYN flag) to port 443 on the default route. I can ping that target IP from another Linux laptop on the same LAN/SSID.

I get the firewall log entry (not for exactly this packet, because once the Android device / OPNsense combination get into this state, I don't really get many log entries for this source IP anymore):

LAN In 2026-01-22T17:46:21 TCP [2a03:fa00:650:30:9a7c:9494:3859:2d9b]:49736 [2a00:1450:4001:805::200a]:443 block Default deny / state violation rule
With the (now even simplified) default rules on the OPNsense ruleset, I really, really don't understand why it would be blocked. Can there be any weird packet flags that cause the "state violation"? Or maybe this has to do with traffic shaping (simple QoS rules)? I am quite at a loss to understand this behavior.

As soon as the Android device starts using a new randomized client IPv6 address, traffic gets through again for a short while before the same happens with the new address.
#11
The issue still happens, although it seems to be harder to reproduce / takes longer to trigger now. I am not sure if having two (non-quick) rules that allow the same traffic from the respective VLAN to WAN makes it less likely that packets don't match? It also seems like the more traffic an IPv6 address has already created, the more quickly it triggers the condition. But as before, I am completely stumped on what could cause these symptoms and am not closer to solving it (though I have reduced complexity somewhat by, e.g., shutting down the backup firewall for the time being). Any other hints on how to debug further would be greatly appreciated.
#12
The only difference is the rule allowing clients in the Guest net to WAN, all other rules have not been modified. I will keep watching if it seems to work without erratic failures with this broader rule and then try to start narrowing down again (and if that doesn't change anything, start up the backup firewall again).
#13
Thanks for the pointers towards debugging options!

I have completely shut down the backup firewall for the time being, just to be certain that CARP is not part of the problem (I didn't expect it based on previous experience, but it's good to be clear). As of a day ago, no host could have received any secondary RAs even if the backup firewall had restarted radvd without my noticing.

I have checked the Aliases definitions under Firewall -> Diagnostics, and they are all correct. Also, just to be sure, I have added another debugging "pass" rule to the Guest incoming interface from any to any (non-quick, IPv4+6, all protocols, with logging).

At the moment, after manually re-connecting my current Android test client, it seems to work and this debugging firewall rule engages. I will wait and see if it stops again at some point and debug further. So far, if it continues to work, I am still puzzled why this debugging rule might hit but the other one won't match all those packets.
#14
I don't think that the clients are dropping their default routes or losing neighbor discovery tables. As you can see in my initial post, the OPNsense host sees the packets on the incoming interface. They just never get forwarded to the WAN side, and I see firewall block log entries. My hypothesis on why it works on a reconnect of the Android device - with new source IPv6 address - is that this creates a new state in the pf firewall that allows the packets to be forwarded. Then, shortly after, _something_ happens to that state and the connections start to drop.

What I don't understand is _why_ the firewall rule described in my original post doesn't match on some of those incoming packets. Are they deemed invalid because of some packet flags? Is the pf state dropped for some reason (I have already switched firewall state behavior to conservative, with no change on this issue).

I didn't previously mention that during the experiments, I manually stop the radvd on the backup firewall, so only one default route is pushed to the clients.
#15
Thanks for the quick reply! The Guest interface I have generated these tcdpump and live firewall logs from doesn't actually have any Virtual IP Aliases on it at the moment (some other VLANs do, and the Android devices on those behave similarly). This Guest interface only has a static IPv4 address (which works without an issue on the Android devices) and the IPv6 address tracking the WAN assigned prefix.

These are (hopefully relevant parts of) the interface details for Guest:

Media 1000baseT <full-duplex>
Media (Raw) Ethernet autoselect (1000baseT <full-duplex>)
Status up
nd6
flags
performnud
auto_linklocal
Routes 192.168.65.0/24
2a03:fa00:650:31::/64
fe80::%igb2_vlan65/64
Identifier opt5
Description Guest
Enabled true
Link Type static
addr4 192.168.65.254/24
addr6 2a03:fa00:650:31:20d:b9ff:fe58:5e2a/64
IPv4 Addresses
192.168.65.254/24
192.168.65.1/24 vhid 6
IPv6 Addresses
2a03:fa00:650:31:20d:b9ff:fe58:5e2a/64
fe80::20d:b9ff:fe58:5e2a/64
VLAN Tag 65
Gateways
Driver vlan1
...
Line Rate 1.00 Gbit/s
Packets Received 154404
Input Errors 0
Packets Transmitted 204150
Output Errors 0
Collisions 0

For the Guest interface, "Allow manual adjustment of DHCPv6 and Router Advertisements" is set. DHCPv4 is served by Dnsmasq, and no DHCPv6 at the moment (as Android devices won't use it anyways). Router advertisements (radvd, not dnsmasq so far) are set to Stateless with "Advertise Default Gateway" ticked and Source Address to "Automatic".