CARP awareness for Wireguard 2.0 & follow a single interface - 23.7.3

Started by nzkiwi68, September 05, 2023, 10:43:17 PM

Previous topic - Next topic
With Wireguard now baked into the core with 23.7.3, my Wireguard custom CARP script broke. I enlisted the help of a friend and together we built a new Wireguard CARP fail-over script.

It works absolutely brilliantly and it's ready for production.
I note that now as the CARP script gets called multiple times as multiple VHID's transition, but it no longer affects Wireguard forcing it to start and stop and often break as was the case. As this new script is called multiple times during a CARP transition from backup master, it starts up Wireguard once and remains stable, each time it is called again it doesn't restart Wireguard which is very good.

Add Wireguard CARP awareness to the GUI and follow a single interface
It would be very nice though, if the GUI had an option "Enable CARP Fail-over" (like FRR)
but!
Also have a drop down where you select a single interface to follow

Why?

  • Because despite it being "impossible" over the many years of pfSense and now OPNsense experience I have seen many instances where CARP is misaligned between the backup and the primary firewall.
  • Also, I would almost always follow the LAN interface, where the Wireguard VPN tunnels exit to and begin from and that's really the only CARP interface that we would want to start or stop Wireguard to follow.
  • The new follow a single interface CARP script will exit quickly if the CARP event is not for the required interface, less work for the firewall to process.

I know I have seen many posts here and on GitHub requesting CARP for Wireguard (including me) and questions raised as to why. I thought I would summarise why Wireguard needs to follow CARP.

Why Wireguard with HA needs to stop and start and follow CARP

  • Wireguard doesn't bind to VHID's, it binds to all interfaces (like the WAN firewall interface) and therefore on a CARP fail-over, the backup firewall Wireguard keeps running and interfering with the primary firewall.
  • The above statement is especially true with Wireguard keepalives
  • Because of the above two points, you need to have under HA sync Wireguard "unselected" to keep Wireguard "off" on the backup firewall but then changes to the Wireguard config are not sync'd anymore.
  • This enables FRR to be left on, even on the backup firewall, no need for dynamic routing to be off on the backup firewall (because with Wireguard off, no traffic can pass) and thus on fail-over everything comes up super fast, it's very good.

I think Wireguard is now "prime time ready".


Here is my script.
I fully acknowledge that this is not my exclusive work, but follows the built-in CARP scripts.

Place this script here:
/usr/local/etc/rc.syshook.d/carp
Ensure the Rights are execute
Call the script "10-wireguard"

----- Script Start -----
#!/usr/local/bin/php
<?php

require_once("config.inc");
require_once("util.inc");

$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';

if (!in_array($type, ['MASTER', 'BACKUP'])) {
      log_msg("Carp '$type' event unknown from source '{$subsystem}'");
      exit;
   }

if (!strstr($subsystem, '@')) {
        log_msg("Carp '$type' event triggered from wrong source '{$subsystem}'");
        exit;
    }

    switch ($type) {
        case 'MASTER':
         $config['OPNsense']['wireguard']['general']['enabled'] = '1';
         write_config("Enable WireGuard on this peer due to CARP event", false);
         log_msg("Starting WireGuard due to CARP event '$type'");
            break;
        case 'BACKUP':
         $config['OPNsense']['wireguard']['general']['enabled'] = '0';
         write_config("Disable WireGuard on this peer due to CARP event", false);
         log_msg("Stopping WireGuard due to CARP event '$type'");
            break;
    }

use OPNsense\Core\Backend;
$backend = new Backend();
$backend->configdRun('template reload OPNsense/Wireguard');
$backend->configdpRun('wireguard configure');
----- Script Stop -----

(I couldn't get the script with "code" selected to look nicely, it keep adding lots of junk to the script...)

Please note, my script above doesn't follow a single interface like LAN, but, it does work!

I can create this on Github if appropriate. Comments please.

OpenVPN client instance will receive a VHID tracking feature in 23.7.4. We want to add the same to WireGuard on our way to 24.1.


Cheers,
Franco



Here's my updated script following a single interface. You normally need to follow LAN since that's where your Wireguard VPN tunnels tunnel to and from...

This works better, especially if you have many CARP interfaces, some of my customers have 7 CARP interfaces:
LAN, WAN, WAN2, DMZ1, DMZ2, VoIP, UNTRUSTED

The issue is each of the CARP transitions fire the script, in my above example the script will get called 7 times!
With the locking to a single inetrface (LAN) the script exits quickly.


Quote#!/usr/local/bin/php
<?php

require_once("config.inc");
require_once("util.inc");
require_once("interfaces.inc");

$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';

if ($subsystem != "1@igb0") exit;

if (!in_array($type, ['MASTER', 'BACKUP'])) exit;

    switch ($type) {
        case 'MASTER':
         $config['OPNsense']['wireguard']['general']['enabled'] = '1';
         write_config("Enable WireGuard due to CARP event on '{$subsystem}'", false);
         log_msg("Starting WireGuard due to CARP event '$type' on '{$subsystem}'");
            break;
        case 'BACKUP':
         $config['OPNsense']['wireguard']['general']['enabled'] = '0';
         write_config("Disable WireGuard due to CARP event on '{$subsystem}'", false);
         log_msg("Stopping WireGuard due to CARP event '$type' on '{$subsystem}'");
            break;
    }

use OPNsense\Core\Backend;
$backend = new Backend();
$backend->configdRun('template reload OPNsense/Wireguard');
$backend->configdpRun('wireguard configure');

You MUST change the line "if ($subsystem != "1@igb0") exit;"

The "1" must be the VHID number and the "igb0" must equal the interface "Device" name.
If your interface is using a vlan, then it could look like this:

if ($subsystem != "2@vlan01") exit;


My updated script 10 Sep 2023 - now working excellently to enable super fast fail-over for WireGuard for clustered firewalls.

Quote#!/usr/local/bin/php
<?php

require_once("config.inc");
require_once("util.inc");
require_once("interfaces.inc");

$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';

if ($subsystem != "1@igb0") exit;

if (!in_array($type, ['MASTER', 'BACKUP'])) exit;

    switch ($type) {
        case 'MASTER':
         shell_exec("/usr/local/sbin/pluginctl -s wireguard start");
         log_msg("Starting WireGuard due to CARP event '$type' on '{$subsystem}'");
            break;
        case 'BACKUP':
         shell_exec("/usr/local/sbin/pluginctl -s wireguard stop");
         log_msg("Stopping WireGuard due to CARP event '$type' on '{$subsystem}'");
            break;
    }


Important notes

**** ONE ****
You MUST change the line "if ($subsystem != "1@igb0") exit;" to follow an interface.
This is normally your LAN since that's where for most people the WireGuard tunnels will be tunneling VPN traffic to and from.

The "1" must be the VHID number and the "igb0" must equal the interface "Device" name.
If your interface is using a vlan, then it could look like this:

if ($subsystem != "2@vlan01") exit;

**** TWO ****

  • Place the script in this location "/usr/local/etc/rc.syshook.d/carp/"
  • Call the script "10-wireguard" - no extension
  • Make sure the script has execute permissions

**** THREE ****
Make sure the WireGuard package is set to replicate to the backup firewall.

System > High Availability > Settings > WireGuard "selected"

**** FOUR ****
If you are using a dynamic routing protocol, I recommend the FRR package and setup BGP.
Then FRR does NOT need to follow CARP, it can remain on and alive on both the MASTER and the BACKUP firewall at the same time.

Hi, does Your script still works without problems or something changed? I'm on 24.1.10, Your script is reacting to CARP (I see messages like "due to master/backup wireguard turn on/off), but on both nodes they are still running and their handshakes break connections :P Manually executing  /usr/local/sbin/pluginctl -s wireguard stop is successfull but just after that /usr/local/sbin/pluginctl -s wireguard status is showing, that it is still running.