Multi-WAN - IPv6 - IPv6 LoadBalanced Gateway Groups

Started by Wirehead, March 09, 2024, 08:20:27 PM

Previous topic - Next topic
March 09, 2024, 08:20:27 PM Last Edit: March 24, 2024, 10:17:56 AM by Wirehead
Hi OPNsense team.

I'm running into strange behaviour with IPv6 Gateway Groups in Firewall rules.
I have a MultiWan Set-up with IPv6.

Both WAN's work great in terms of IPv6 individually (e.g. set the specific IPV6 gateway, into a rule that negates our own prefix ) -> Thus when something does not belong to our own "ipv6 networks", we route it out a specific IPv6 WAN interface.

This works amazingly well, ... provided I don't use a Gateway group.

The moment I start using gateway groups (and monitoring for both links is working fine!) - at random, opnsense starts replying "Destination Unreachable" for any IPv6 traffic. (see opnsense00.png / opnsense01.png).


wide image below - open it in a new tab to see details:


If I then replace the GWgroupIPv6 with *any* of the two gateways directly, apply the rule, and clear the firewall states, things work again immediately. (see opnsense02.png)



So something very strange seems to be up with the way this GatewayGroup on IPv6 is being handled. I realize this might not even be opnsense, but BSD itself, but maybe someone has run into the same issue?

An IPv6 gateway group only works if both WANs are from the same ISP and they allow using the same prefix on both. Typically, each WAN gets assigned its own prefix which can't be routed via the other WAN.

Otherwise, you'll have to use NPT for one of the WANs.

Cheers
Maurice
OPNsense virtual machine images
OPNsense aarch64 firmware repository

Commercial support & engineering available. PM for details (en / de).

Hi Maurice.

Thanks for your reply, however, it is irrelevant to the problem, as I have no issues when I select either of them (for the second WAN I am using translation, as that is working fine.)

As stated, the problem only arises when specifically using the Gateway group.

For information, this is how the relevant rules look, when looking at pfctl:

lagg0 = LAN
vlan0.40 = WAN01
vlan0.20 = WAN02


WAN01 only (no problem)
pass in quick on lagg0 route-to (vlan0.40 fe80::3a43:7dff:fe98:960f) inet6 from any to ! <WAN01_IPv6_Prefix> flags S/SA keep state label "a3772f9a4cec84b22c2501a1e43c9e1e"

WAN02 only (no problem) + its specific address translation rule

pass in quick on lagg0 route-to (vlan0.20 fe80::22e0:9cff:fe39:8c01) inet6 from any to ! <WAN01_IPv6_Prefix> flags S/SA keep state label "a3772f9a4cec84b22c2501a1e43c9e1e"
nat on vlan0.20 inet6 all -> (vlan0.20:0) port 1024:65535


WAN01 and WAN02 in round-robin. (problematic - sometimes, OPNsense replies with "destination unreachable" and then keeps the faulty state.)

pass in quick on lagg0 route-to { (vlan0.40 fe80::3a43:7dff:fe98:960f), (vlan0.20 fe80::22e0:9cff:fe39:8c01) } round-robin sticky-address inet6 from any to ! <WAN01_IPv6_Prefix> flags S/SA keep state label "a3772f9a4cec84b22c2501a1e43c9e1e"
nat on vlan0.20 inet6 all -> (vlan0.20:0) port 1024:65535


Have you tried to only NAT certain prefix(WAN01 net?) on vlan0.20 instead of all?

Quote from: zan on March 10, 2024, 12:57:50 PM
Have you tried to only NAT certain prefix(WAN01 net?) on vlan0.20 instead of all?
Hi - there's nothing wrong with the NAT. Translation works fine when the gateway is applied directly, rather than a loadbalanced group.

Small update - I've disabled the shaper on both WAN01 and WAN02, and for now, it does seem stable in the load-balanced way.

I'll keep an eye on it if it breaks. So it could be the combination of CoDel-shaping + pf loadbalancing.

Nope, broke again. Shaper is not the culprit. Seems to be really something internal to PF.

So, as failover groups (tier1/tier2) - no issues for multiple days.
But as loadbalancer group (tier1/tier1) - quasi immediate losses, where OPNsense replies "destination unreachable"

March 24, 2024, 10:01:17 AM #9 Last Edit: March 24, 2024, 02:40:23 PM by Wirehead
Hi,

I seem to have found the culprit.

The documentation mentions to -specifically- disable shared forwarding when using multiple gateways with the same Tier:


However, this causes this behaviour where at some point, the OPNsense gateway (for IPv6) starts replying "destination unreachable" to the client.

Multiple issues with this shared/non-shared behaviour have been reported earlier:

https://github.com/opnsense/core/issues/5089
https://github.com/opnsense/core/issues/5094
https://github.com/opnsense/core/issues/5869

In https://github.com/opnsense/core/issues/5869#issuecomment-1611162919 the PR was made to put this note in the documentation, but it seems at some point, this comment has become "inverted". Because Load-Balancing is working now, provided that I do enable shared forwarding. For information, I have also disabled sticky connections, as to have a maximum spread of sessions across WAN links. For specific tools that require fixed Src-IP, you can make a tiered GW group. Enabling/disabling sticky sessions with disabling shared forwarding as recommended still caused the erratic behaviour.