Hi all,
I'm just wondering if there's a mechanism in place right now that takes care of disabling particular services on the backup node of a HA pair. I have been searching the HA code because I wanted to add e.g. WireGuard VPN to those services but was not able to find any mechanism in place already.
If you run a HA cluster of two systems, you regularly have one IP address for each of the systems on each physical interface. And additionally a number of CARP addresses that failover should the master node go down.
For some services it is simple to enable them on both nodes and sync the config - postfix, bind, ...
Just have them listen to 127.0.0.1 and for every interface where the service should be available add a port forward NAT rule likeCARP address:53 --> 127.0.0.1:53
So in the above example the service is running on both nodes but only visible on the current CARP master. Great.
But there are some services that cannot be treated that way. DHCPd and WireGuard are the two most important I have in mind. It is essential (IMHO) that the configuration is synced from the master to the backup, but the service must be disabled on the backup node. And whenever there is a failover the service needs to be started on the new master ...
Is there any way OPNsense implements this now? If not is this on the roadmap? Any plans?
If it is implemented for some services I would appreciate a pointer to the source so I can try to add more.
Kind regards,
Patrick
Nobody?
While a solution for radvd seems to be in the making: https://github.com/opnsense/core/pull/5247
Any idea when we will see it in production?
For WireGuard I came up with this idea:
- bind wireguard to odd port that is not used by the peer
- for inbound packets set up port forward NAT from "CARP address:correct port" to "127.0.0.1:odd port"
- for outbound packets set up port NAT from "WAN address:odd port" to "CARP address:correct port"
Will specifically the outbound NAT rule work only on the host that currently has the CARP address?
The downside is that the backup node will still throw WG packets with the "odd" source port at the peer ...
If only WireGuard had a "bind interface" option so you can set it to 127.0.0.1 ...
BTW: how does OPNsense manage DHCPv4 in a HA setting? I don't use it, so I can't check.
For DHCP the builtin cluster functionality is used.
Any pointer to the code? I was not able to find it. Thanks!
Patrick
Hi Patrick,
in the DHCPD documentation is something written about the failover functionality:
https://kb.isc.org/docs/isc-dhcp-41-manual-pages-dhcpdconf#dhcp-failover (https://kb.isc.org/docs/isc-dhcp-41-manual-pages-dhcpdconf#dhcp-failover)
And if you look under Services: DHCPv4: [<interface>] you can set the "Failover peer IP" and "Failover split".
KH
Ah ... so it's DHCPd's builtin cluster functionality. I was looking for a mechanism in OPNsense's HA code that would start and stop services according to the node's role (active/backup) in a cluster like the other products I am familiar with do.
Probably that's why I did not find anything. So possible running WireGuard on both nodes and getting creative with the NAT is the only solution for WireGuard.
Why must wireguard be disabled on the backup node?
Because it will throw packets at the peer otherwise. And the peer will be mighty confused if two systems try to initiate a tunnel using the same key pair.
Yes, you could set up two tunnels and run OSPF or similar. But isn't the point of a CARP/VRRP based HA solution to look like a single system from outside?
The Sidewinder firewall we retire did all that. Enable/disable/reconfigure services according to the role of the node in the cluster. I seriously expected something similar.
As I always tell my colleagues: CARP is not HA. It's the most trivial part of HA. The complicated part is service management and synchronization.
I see your point, it's about site2site tunnels, I was only thinking about passive endpoints.
Quote from: bimbar on November 04, 2021, 10:14:27 AM
I see your point, it's about site2site tunnels, I was only thinking about passive endpoints.
That gave me an idea. I have only one HA pair at the central office. So if I NAT outbound WG packets there and remove the fixed addresses for all peers, they all need to "dial in". That's probably going to work ...
Thanks ;)
That leaves radvd ... hope that patch is coming soon.
I patch that in manually on mine after every update. It's really critical to anything IPv6 working.