Archive > 21.7 Legacy Series

HA cluster, IPv6 CARP and router advertisements - best practice?

(1/4) > >>

Patrick M. Hausen:
Hi all,

I have a pair of OPNsense firewalls and we are dual-stack throughout the entire data center. For IPv6 everything is routed, no NAT taking place. The DMZ depicted in the network overview has got a single "permit anything out" rule. From outside to the DMZ certain selected services to certain hosts are permitted, but as I said no NAT, port forwarding, just firewall rules.


--- Code: ---                     +--------------------------------------------------------------+                       
                     |                                                              |                       
                     |                            Uplink                            |                       
                     |                                                              |                       
                     +--------^--------------------------------------------^--------+                       
                              |                                            |                               
                              |                                            |                               
                              |                                            |                               
                              |                                            |                               
                     +-----------------+                          +-----------------+                       
                     |                 |       HA-Interface       |                 |                       
                     |   OPNsense 1    |--------------------------|   OPNsense 2    |                       
                     |                 |                          |                 |                       
                     +-----------------+                          +-----------------+                       
                              |                                            |                               
                              |                    CARP                    |                               
 2a00:b580:a000:4000::252/64  +-------> 2a00:b580:a000:4000::254/64 <------+   2a00:b580:a000:4000::253/64 
 fe80::f690:eaff:fe00:6501/64 |                      |                     |   fe80::f690:eaff:fe00:6507/64
                              |                      |                     |                               
                              |                      |                     |                               
            #-----------------v----------------------v---------------------v-------------------#           
                                        DMZ 2a00:b580:a000:4000::/64                                       
--- End code ---

We use SLAAC for host configuration in the DMZ and I configured radvd as pictured in the screenshot. What I would have expected as a result is that the CARP address is announced as the default router.

What happens instead is that the link-local address of the interface is announced. OK, this makes perfect sense in a single unit setup. But in our case both the active and the backup node announce their respective link-local addresses.

This leads to intermittent drops of TCP connections and possibly other problems which we have not yet clearly identified if a client with two default routes decides to switch the gateways in the middle of a long lived connection.

Questions:

* Why isn't the global unicast CARP address announced instead if the link local ones?
* Even with link-local, shouldn't pfSync take care of keeping the state tables in sync so it should not happen that a packet hits the "default deny" rule?
* When I manually disable radvd on the backup, things work reliably - shouldn't the HA mechanism take care of toggling the service on/off depending on the role of the node?
* Related but different topic: what happens when I enable dhcpd in a HA setup? Shouldn't the HA mechanism disable the backup?
* What's considered the best practice in this scenario?
DHCPv6 isn't of any use here, because it doesn't send a default gateway to the client systems. This is only sent via RA. I could configure all host statically in the DMZ, but once we get to the LAN, which at the moment uses SLAAC, too - because "what else" - that is out of the question. Too many devices coming and going.

Workaround: exempt "DHCPv6" from HA sync and disable RA on the backup node. But that means in case of a failover a manual intervention is necessary to get IPv6 working again.

So ... is there a solution?

Kind regards,
Patrick

bimbar:
As far as I know, there is no solution. See also https://forum.opnsense.org/index.php?topic=24492.0 .

Patrick M. Hausen:
OK, then I'll disable synchronisation for DHCPv6/RA and set priorities accordingly so at least in case of a full crash of the primary the backup will kick in.

Thanks.

franco:
There's a couple of things being discussed:

https://github.com/opnsense/core/issues/4953
https://github.com/opnsense/core/issues/4897
https://github.com/opnsense/core/pull/5185
https://github.com/opnsense/core/pull/5247

First of all:

> What happens instead is that the link-local address of the interface is announced.

That's in fact what radvd is going to do: advertise the first link-local address it finds on the configured interface. Non-link-local addresses are not supposed to work and their behaviour is undefined.

For a quick fix I think you can set the router advertisement priority to a lower value on the backup.

For the long run we need to implement latching on to CARP VIP alias, but I don't want to offer manual adjustment on the router advertisement side as that requires double the amount of work in code and documentation and support.

We should be able to ship a workable solution in 22.1.


Cheers,
Franco

bimbar:
Also, there's the possibility that you have redundant uplinks and potentially more than one router on your network, with different public prefixes, or even one router with more than one prefix.
The problem there is that the default gateway LL addresses don't seem to be associated to the prefixes they belong to.

As to CARP, I am not convinced that it is the right strategy in IPv6 to even use CARP, it may be more correct to just let both firewalls advertise themselves. The difficulty with that lies in how to make sure that the return packet will take the same path.

Quite a big part of the whole problem is that client devices seem to handle a multi-RA scenario more or less well, depending on which OS and so on.

Navigation

[0] Message Index

[#] Next page

Go to full version