OPNsense Forum

Archive => 21.7 Legacy Series => Topic started by: Patrick M. Hausen on October 14, 2021, 08:24:57 pm

Title: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: Patrick M. Hausen on October 14, 2021, 08:24:57 pm
Hi all,

I have a pair of OPNsense firewalls and we are dual-stack throughout the entire data center. For IPv6 everything is routed, no NAT taking place. The DMZ depicted in the network overview has got a single "permit anything out" rule. From outside to the DMZ certain selected services to certain hosts are permitted, but as I said no NAT, port forwarding, just firewall rules.

Code: [Select]
                     +--------------------------------------------------------------+                       
                     |                                                              |                       
                     |                            Uplink                            |                       
                     |                                                              |                       
                     +--------^--------------------------------------------^--------+                       
                              |                                            |                               
                              |                                            |                               
                              |                                            |                               
                              |                                            |                               
                     +-----------------+                          +-----------------+                       
                     |                 |       HA-Interface       |                 |                       
                     |   OPNsense 1    |--------------------------|   OPNsense 2    |                       
                     |                 |                          |                 |                       
                     +-----------------+                          +-----------------+                       
                              |                                            |                               
                              |                    CARP                    |                               
 2a00:b580:a000:4000::252/64  +-------> 2a00:b580:a000:4000::254/64 <------+   2a00:b580:a000:4000::253/64 
 fe80::f690:eaff:fe00:6501/64 |                      |                     |   fe80::f690:eaff:fe00:6507/64
                              |                      |                     |                               
                              |                      |                     |                               
            #-----------------v----------------------v---------------------v-------------------#           
                                        DMZ 2a00:b580:a000:4000::/64                                       

We use SLAAC for host configuration in the DMZ and I configured radvd as pictured in the screenshot. What I would have expected as a result is that the CARP address is announced as the default router.

What happens instead is that the link-local address of the interface is announced. OK, this makes perfect sense in a single unit setup. But in our case both the active and the backup node announce their respective link-local addresses.

This leads to intermittent drops of TCP connections and possibly other problems which we have not yet clearly identified if a client with two default routes decides to switch the gateways in the middle of a long lived connection.

Questions:

DHCPv6 isn't of any use here, because it doesn't send a default gateway to the client systems. This is only sent via RA. I could configure all host statically in the DMZ, but once we get to the LAN, which at the moment uses SLAAC, too - because "what else" - that is out of the question. Too many devices coming and going.

Workaround: exempt "DHCPv6" from HA sync and disable RA on the backup node. But that means in case of a failover a manual intervention is necessary to get IPv6 working again.

So ... is there a solution?

Kind regards,
Patrick
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: bimbar on October 15, 2021, 09:14:18 am
As far as I know, there is no solution. See also https://forum.opnsense.org/index.php?topic=24492.0 .
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: Patrick M. Hausen on October 15, 2021, 10:41:47 am
OK, then I'll disable synchronisation for DHCPv6/RA and set priorities accordingly so at least in case of a full crash of the primary the backup will kick in.

Thanks.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: franco on October 15, 2021, 07:31:33 pm
There's a couple of things being discussed:

https://github.com/opnsense/core/issues/4953
https://github.com/opnsense/core/issues/4897
https://github.com/opnsense/core/pull/5185
https://github.com/opnsense/core/pull/5247

First of all:

> What happens instead is that the link-local address of the interface is announced.

That's in fact what radvd is going to do: advertise the first link-local address it finds on the configured interface. Non-link-local addresses are not supposed to work and their behaviour is undefined.

For a quick fix I think you can set the router advertisement priority to a lower value on the backup.

For the long run we need to implement latching on to CARP VIP alias, but I don't want to offer manual adjustment on the router advertisement side as that requires double the amount of work in code and documentation and support.

We should be able to ship a workable solution in 22.1.


Cheers,
Franco
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: bimbar on October 15, 2021, 07:34:49 pm
Also, there's the possibility that you have redundant uplinks and potentially more than one router on your network, with different public prefixes, or even one router with more than one prefix.
The problem there is that the default gateway LL addresses don't seem to be associated to the prefixes they belong to.

As to CARP, I am not convinced that it is the right strategy in IPv6 to even use CARP, it may be more correct to just let both firewalls advertise themselves. The difficulty with that lies in how to make sure that the return packet will take the same path.

Quite a big part of the whole problem is that client devices seem to handle a multi-RA scenario more or less well, depending on which OS and so on.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: franco on October 15, 2021, 07:43:26 pm
I think for static prefixes this is solvable, but for cross-ISP dynamic PD I don't think anyone but single-line consumers are happy (for the most part).

The problem isn't IPv6... it's the lack of NAT with the business decision of the ISPs to hand out dynamic prefixes. And that likely isn't going to change.


Cheers,
Franco
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: bimbar on October 15, 2021, 07:45:11 pm
I solved this the ugly way by using outgoing NAT on my firewalls, so that works, but it's hardly the brave new world ipv6 is supposed to be.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: franco on October 15, 2021, 07:46:22 pm
Yeah, I agree. I'm not saying I miss NAT in IPv6, but "it is what it is". ;)


Cheers,
Franco
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: bimbar on October 15, 2021, 07:52:06 pm
I've been trying to do this right with multiple firewalls for the last 10 years now, but it never quite works the way it should. With opnsense I've come the closest so far.

I hope we're thinking about it the right way, if there even is such a thing, and don't want the wrong features we don't even know yet we don't need because there's a better way to do it, if that makes any sense.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: Patrick M. Hausen on October 15, 2021, 08:34:58 pm
I must admit that I am only interested in the static prefix case. Sorry, dynamic prefixes i.e. consumer subscriber lines and a HA setup? Seriously?

In most scenarios it is of course perfectly ok to announce a link local address via RA. But there's nothing in the standard that explicitly prohibits using a global unicast address as the default gateway.

Fact: Sidewinder did that. You defined a cluster address that was bound to the active node and that was the default gateway. This is a closed source system but it's FreeBSD based and it's perfectly doable. To announce a "cluster" address, no matter that HA protocol, via RA.

I retired all physical Sidewinder firewalls I can toy with, even installed OPNsense on two of the appliances now, but I have a virtualised instance in my private home lab (in ESXi) - too busy this weekend, but I could turn this into a clustered setup and try to find out from the "outside" how a Sidewinder cluster presents itself as far as IPv6 is concerned. We have had this running for more than a decade, it's rock solid.
So possibly that would give us an idea of the best approach ...

Kind regards,
Patrick

P.S. I will look up the linked issues later.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: bimbar on October 16, 2021, 02:19:09 pm
I have another interesting link for that: https://datatracker.ietf.org/doc/html/rfc7157 "IPv6 Multihoming without Network Address Translation"

It talks about many of the same problems, namely gateway selection and source address selection. Sadly, not a lot of solutions.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: bimbar on October 16, 2021, 02:50:39 pm
Hi all,

I have a pair of OPNsense firewalls and we are dual-stack throughout the entire data center. For IPv6 everything is routed, no NAT taking place. The DMZ depicted in the network overview has got a single "permit anything out" rule. From outside to the DMZ certain selected services to certain hosts are permitted, but as I said no NAT, port forwarding, just firewall rules.

Code: [Select]
                     +--------------------------------------------------------------+                       
                     |                                                              |                       
                     |                            Uplink                            |                       
                     |                                                              |                       
                     +--------^--------------------------------------------^--------+                       
                              |                                            |                               
                              |                                            |                               
                              |                                            |                               
                              |                                            |                               
                     +-----------------+                          +-----------------+                       
                     |                 |       HA-Interface       |                 |                       
                     |   OPNsense 1    |--------------------------|   OPNsense 2    |                       
                     |                 |                          |                 |                       
                     +-----------------+                          +-----------------+                       
                              |                                            |                               
                              |                    CARP                    |                               
 2a00:b580:a000:4000::252/64  +-------> 2a00:b580:a000:4000::254/64 <------+   2a00:b580:a000:4000::253/64 
 fe80::f690:eaff:fe00:6501/64 |                      |                     |   fe80::f690:eaff:fe00:6507/64
                              |                      |                     |                               
                              |                      |                     |                               
            #-----------------v----------------------v---------------------v-------------------#           
                                        DMZ 2a00:b580:a000:4000::/64                                       

We use SLAAC for host configuration in the DMZ and I configured radvd as pictured in the screenshot. What I would have expected as a result is that the CARP address is announced as the default router.

What happens instead is that the link-local address of the interface is announced. OK, this makes perfect sense in a single unit setup. But in our case both the active and the backup node announce their respective link-local addresses.

This leads to intermittent drops of TCP connections and possibly other problems which we have not yet clearly identified if a client with two default routes decides to switch the gateways in the middle of a long lived connection.

Questions:
  • Why isn't the global unicast CARP address announced instead if the link local ones?
  • Even with link-local, shouldn't pfSync take care of keeping the state tables in sync so it should not happen that a packet hits the "default deny" rule?
  • When I manually disable radvd on the backup, things work reliably - shouldn't the HA mechanism take care of toggling the service on/off depending on the role of the node?
  • Related but different topic: what happens when I enable dhcpd in a HA setup? Shouldn't the HA mechanism disable the backup?
  • What's considered the best practice in this scenario?

DHCPv6 isn't of any use here, because it doesn't send a default gateway to the client systems. This is only sent via RA. I could configure all host statically in the DMZ, but once we get to the LAN, which at the moment uses SLAAC, too - because "what else" - that is out of the question. Too many devices coming and going.

Workaround: exempt "DHCPv6" from HA sync and disable RA on the backup node. But that means in case of a failover a manual intervention is necessary to get IPv6 working again.

So ... is there a solution?

Kind regards,
Patrick

I just found out: https://github.com/opnsense/core/pull/5185 should be exactly what you need, combined with a link-local CARP address. As far as I know, you can not use a GUA as next-hop.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: Patrick M. Hausen on October 16, 2021, 03:48:25 pm
Awesome! Thanks!
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: tomstephens89 on October 21, 2021, 03:01:11 pm
Same issue here. Thanks to @pmhausen for confirming the problem.

https://forum.opnsense.org/index.php?topic=25243.msg121205#msg121205

Until the ability to specify the source address of radvdis implemented, the only way that this works is to keep the ra daemon on the backup stopped until it needs to become master. Then the RA Daemon may be started on the now master, and you must ensure it is stopped on the now backup.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: tomstephens89 on January 24, 2022, 09:44:29 pm
What was the outcome of this? I see there was a lot of activity on GitHub?
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: franco on January 25, 2022, 07:28:11 am
21.7.x already has better CARP source address support. 22.1 also has alias support and that's the end of it.


Cheers,
Franco
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: tomstephens89 on January 25, 2022, 09:50:01 pm
Does this mean I still need to stop the RA daemon on the backup box until it is needed?

Or does CARP with IPv6 now function via the VIP as the gateway address?
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: Patrick M. Hausen on January 25, 2022, 10:37:13 pm
It works automatically with link-local as CARP.
Title: Re: HA cluster, IPv6 CARP and router advertisements - best practice?
Post by: franco on January 26, 2022, 08:43:48 am
To be precise, it only works using link-local addressing in the first place ;)


Cheers,
Franco