[SOLVED] IPv6 PBR using a WireGuard tunnel interface breaks after connection loss

Started by Legally a Shrimp, May 13, 2025, 12:38:52 PM

Previous topic - Next topic
Hi,

recently I had to move in with a friend of mine. We both wanted to keep our LANs as close as possible to how they were prior to me moving in. So now I'm facing an admittedly overly complex and perhaps even silly network setup:

Internet-->DrayTek Vigor 167 VDSL2 Modem-->OpenWrt on NanoPi R6C-->opnSense on random AliExpress x86 mini PC(-->my LAN)
                                                               '-->AVM Fritz!Box 7490(-->roommate's LAN)

All I had to do to make this work was to:
  • configure VLAN (VDSL connections require VLAN tag 7 here), PPPoE and DHCPv6 (client) on the WAN interface and DHCPv6 (server, with static leases for predictable PD) on the LAN interface on the OpenWrt router
  • disable NAT on the opnSense and Fritz!Box routers
  • set up static routes for both, my roommate's and my IPv4 subnets on the OpenWrt router (to avoid double-NATting)
  • do some port forwarding on the OpenWrt router

Despite the perceived weirdness of this set up, everything seems to work perfectly fine. Well, almost everything...

For sake of troubleshooting I've made a backup of my current config, reinstalled opnSense 25.1.6 and only applied the most essential settings. Most importantly, I've got two firewall rules on the LAN interface. One for IPv4 and one for IPv6. In which I've only specified the IP versions, the source addresses and the WireGuard tunnel interfaces as gateways. This policy based routing works fine, too. That is, only until the upstream router gets disconnected from the internet and re-establishes a new connection. (Where I'm from it's usual for ISPs to forcefully reconnect their customers every 24 hours.) After that it still uses the WireGuard tunnel for IPv4 connections, but suddenly all IPv6 connections get routed via the default IPv6 WAN gateway.

I have no idea why, even in theory only, this would possibly be the case and need advice how to even begin troubleshooting this.

Thanks in advance!



PS: The machine gets its IPv4 address via ISC DHCPv4 and IPv6 address via ISC DHCPv6. Static leases are set up and working. The machine gets the same IPv4 address and IPv6 suffix every time. It only ever has exactly one global scope IPv6 address. NAT is set up on only for the IPv4 and IPv6 WireGuard tunnel interfaces on the opnSense router. "Allow default gateway switching" is unchecked and "Skip rules when gateway is down" checked. All of this is complex enough as is (at least to me), so I wish to not use ULA. My roommate is using his Fritz!Box to establish a WireGuard tunnel, too, and it doesn't behave this way. This is why I assume it's an issue with (my configuration of) opnSense, hence is why I post here and not in the OpenWrt forums. 😅

After doing some more digging and applying all sorts of changes to the settings, my problem has shifted: I now assume I have an issue with (ISC) DHCPv6 instead. After the PPPoE connection gets dropped, LAN clients simply won't get IPv6 adresses assigned anymore. IPv6 connectivity on the OPNsense machine itself, however, is fine. So, once I notice IPv6 connections start failing (ie. every 24h), all I need to do is to restart that one service and everything works as expected again (Windows clients might still need a quick ipconfig /renew6). While trying to troubleshoot I've come across several threads in this forum and on github that are explaining this exact issue. However all the mentioned workarounds I have found so far (such as checking Interfaces -> Settings -> Prevent release) don't seem to do anything in my case. I guess, I'll try working around this issue with a script that restarts the isc-dhcpd6 service whenever the prefix changes.

Edit: The issue is solved.
So, I've switched from ISC to KEA to dnsmasq to assign IPv6 addresses via DHCPv6. For whatever weird reason, I don't have the described problem(s) with dnsmasq. As is probably obvious, I lack the necessary understanding of any of this to make this make sense (especially when in theory the configuration was all the same, DHCPv6+managed RA+static leases), but all that matters to me is that it finally works as intended now.