CARP with DHCP on WAN

Started by bubbagump, January 18, 2021, 11:13:00 PM

Previous topic - Next topic
I'm 99% of the way there.. I can get the backup WAN interface to come up (Still trying to figure out how to get both WAN interfaces up) but they aren't passing traffic.  I'm trying to figure out how to mod the script to down the WAN interface and up it properly when there is a failover.    As it stands, I have 2 scripts set to execute but only the first one is working.   A couple questions:

the ifkey.. I assume that the "wan" is just a placeholder.. i've changed that variable to the interface (igb4) and it appears to be working.  The second script has igb5 and is not working.    Any ideas?

Thanks again. you guys solved a problem that has been vexing me for a while.. now if I could just get both WAN interfaces working.

Not sure why igb4 is working at all.
It's the interface key in the config.xml.
So lowercase of the internal interface name. i.e. lan, wan, opt1, opt2 etc.
Don't mix it with the name you gave to the interface. You can see these in the "Interfaces" -> "Overview" behind the interface in brackets (the first one before comma).

Awesome.. I couldn't figure out how to get that to work.

Another question.. your git has "install on backup router"..  I would assume that I have to down the WAN on the primary router as well, no?

Thank you again!

Not sure if I got your question.
But you need the script on both routers.
But during setup I recommend to disable the WAN(s) on the BACKUP router manually to not have both enabled at the same time. On the MASTER you could leave the interface enabled.

Spali, thanks for the assistance... I have the failover workingish.. when my backup comes up, the interfaces come up and the system runs the newwanip script for both, but I don't get an IP address or an active gateway.

I am using a managed switch and have dhcp snooping off, the ISP modem and both interfaces in the same VLAN (L2 only, unrouted), and I'm spoofing the MAC from my primary to my backup.  Still going through some ideas on what is happening.. Wondering if I should be spoofing the MAC address on my backup somewhere other than in the GUI.  I'm currently just spoofing the primary router's WAN MACs on the secondary router.

Regarding the MAC, maybe you need to sniff the DHCP traffic to find out whats wrong (probably mac spoofing not working properly?). In my case I have two virtual machines. So I spoof the mac on the virtual network card. I have it entered in the GUI too, but maybe this doesn't really work? If your routers are virtual, then don't forget to enable promiscuous mode.
do you use my version of the script with write_config and interface_configure or a custom one?
I'm asking because I had a similar problem as I just started with a script das does just start and stop the interface. The version that uses the configuration interface of opnsense kicks in a lot of reconfiguration tasks that may help.

I am definitely game to try something new.  I'm using your script from the git linked in this thread.   

I'm playing with some settings on my managed switch to see if that's the culprit.  I'm going to stick a cheap netgear switch on the primary Lan to rule out anything blocking traffic (STP is disabled,  DHCP snooping off, and I've disabled the mac-move limitation on this switch). 

I actually see my interfaces on the backup come up just fine.. they appear to get the same IP address that the WAN Primary had, though I'm not sure if it's because the dhcp lease was sync'd from the primary or if it's requesting a refresh.  Either way, my gateways do not come up (I've tried with monitoring on and disabled.. same end result).   

I am onto something here.  I noticed that my gateways weren't working properly (I have 2 gateways configured and 2 gateway groups).  To rule out gateway configuration, I deleted all of them and tried again (with a single wan for now) and boom.. missed one ping and back up.

Also finding that the gateway configuration is quite sticky.. not sure where it's hooked but I can't get rid of it.  I had a gateway called "Verizon_WAN_DHCP" and noticed that the Verizon interface was coming up with a new GW "Verizon_WAN_GW".. that second GW isn't configured in the GUI.  I checked the config file and all references to it are gone, but when I fail over, it pops back up.  Very odd I think.

I also noticed that my switch (Juniper EX2200) was learning the MAC on the primary port, but when I switched over, I see that the MAC is still tied to the primary port.  I set up both ports going to my router as no-mac-learning.. and that seems to have bypassed anything the switch is causing.  Now the transaction is between the  ISP device and my router, leaving DHCP snooping and other security features (for this vlan anyway) on the nightstand.


It's all working.   The gateway is still wonky on the primary (I can't seem to delete my old gateway, but a new one pops up and works great).    I had to disable dhcp snooping on my WAN VLANs on my managed switch.  Disabling snooping didn't work alone, though.  I had to disable mac-learning as well.  I lose 1 ping and all is well.  My secondary WAN (Spectrum) comes up quite slowly but that's OK.  They suck.

Great, nice to hear a success  ;D

Thanks Spali for being awesome and helping me a bit.   You are awesome.

October 22, 2024, 04:53:25 AM #26 Last Edit: October 22, 2024, 05:05:32 AM by bitcore
I am a Google Fiber subscriber. My environment is simple with an active/passive firewall -  a KVM VM with hardware passthrough of a quad port NIC, and physical hardware firewall with some intel NICs. I have a single WAN, and a single LAN interface running CARP.  The VPNs I use continue to function after failover. Stateful protocols such as ipsec or openvpn will drop and need to re-negotiate, but can reconnect immediately. Wireguard has no such issue.

Spali's github post is very useful: https://gist.github.com/spali/2da4f23e488219504b2ada12ac59a7dc

I have updated my personal script to the following, which is a mash of theirs and mine, which I posted in Reddit some time ago: https://www.reddit.com/r/opnsense/comments/runb4r/diy_ha_activepassive_for_home_internet/


#!/usr/local/bin/php
<?php
require_once("config.inc");
require_once("interfaces.inc");
require_once("util.inc");
$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';
if ($type != 'MASTER' && $type != 'BACKUP') {
    log_error("Carp '$type' event unknown from source '{$subsystem}'");
    exit(1);
}
if (!strstr($subsystem, '@')) {
    log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");
    exit(1);
}
foreach($config['interfaces'] as $ifkey => $interface) {
    if ($ifkey=='opt3') {
        if ($type == 'MASTER') {
            log_msg("Carp Status is now Master!");
            log_msg("Enabling interface: $ifkey - {$interface['if']}");
            shell_exec("/sbin/ifconfig {$interface['if']} up");
            $config['interfaces'][$ifkey]['enable'] = '1';
            write_config("enable interface '$ifkey' due CARP event '$type'", false);
            interface_configure(false, $ifkey, false, false);
            sleep(1);
            log_msg("Restarting DHCPD");
            shell_exec('pluginctl -s dhcpd restart');
            sleep(1);
            log_msg("Issueing dhclient command to request a DHCP lease");
            shell_exec("dhclient {$interface['if']}");
        } else if ($type == 'BACKUP') {
            log_msg("Carp Status is now Backup!");
            log_msg("Disabling interface: $ifkey - {$interface['if']}");
            shell_exec("/sbin/ifconfig {$interface['if']} down");
            unset($config['interfaces'][$ifkey]['enable']);
            write_config("disable interface '$ifkey' due CARP event '$type'", false);
            interface_configure(false, $ifkey, false, false);
            log_msg("Stopping DHCPD");
            shell_exec('pluginctl -s dhcpd stop');
        }
    }
}
?>


(the forum is breaking the greater than and less than in the PHP brackets at the start and end, correct them yourself)


  • This version will also manually "down" interfaces, as disabling them does not appear to fully "shut" the interface in my environment. This can cause mac flapping, and all of the issues related to that condition.
  • My version also stops the DHCP Daemon, which ensures that I only have one DHCP server running on my LAN. I need the backup device to actually become "passive". Calling dhclient may not be necessary with the interface_configure call, but it's a holdover from when I previously only used shell_exec("/sbin/ifconfig {$interface['if']} down");   to up/down the interfaces, instead of enabling/disabling the interfaces.
  • I use log_msg instead of log_error so that these events show up in the general system log as a "notice".

I do recommend creating a gateway with "Upstream Gateway" checked and a higher metric than the normal WAN gateway, as per spali's github comments to allow the backup to reach the internet via the LAN.

I also recommend disabling the "Backup" router's WAN interface - so that your secondary device will boot up with the WAN in disabled state, and the CARP script will re-enable the interface if CARP goes master. This prevents the devices from both booting up and each having active WAN interfaces.

Quote from: bitcore on October 22, 2024, 04:53:25 AM
I am a Google Fiber subscriber. My environment is simple with an active/passive firewall -  a KVM VM with hardware passthrough of a quad port NIC, and physical hardware firewall with some intel NICs. I have a single WAN, and a single LAN interface running CARP.  The VPNs I use continue to function after failover. Stateful protocols such as ipsec or openvpn will drop and need to re-negotiate, but can reconnect immediately. Wireguard has no such issue.

Spali's github post is very useful: https://gist.github.com/spali/2da4f23e488219504b2ada12ac59a7dc

I have updated my personal script to the following, which is a mash of theirs and mine, which I posted in Reddit some time ago: https://www.reddit.com/r/opnsense/comments/runb4r/diy_ha_activepassive_for_home_internet/


#!/usr/local/bin/php
<?php
require_once("config.inc");
require_once("interfaces.inc");
require_once("util.inc");
$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';
if ($type != 'MASTER' && $type != 'BACKUP') {
    log_error("Carp '$type' event unknown from source '{$subsystem}'");
    exit(1);
}
if (!strstr($subsystem, '@')) {
    log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");
    exit(1);
}
foreach($config['interfaces'] as $ifkey => $interface) {
    if ($ifkey=='opt3') {
        if ($type == 'MASTER') {
            log_msg("Carp Status is now Master!");
            log_msg("Enabling interface: $ifkey - {$interface['if']}");
            shell_exec("/sbin/ifconfig {$interface['if']} up");
            $config['interfaces'][$ifkey]['enable'] = '1';
            write_config("enable interface '$ifkey' due CARP event '$type'", false);
            interface_configure(false, $ifkey, false, false);
            sleep(1);
            log_msg("Restarting DHCPD");
            shell_exec('pluginctl -s dhcpd restart');
            sleep(1);
            log_msg("Issueing dhclient command to request a DHCP lease");
            shell_exec("dhclient {$interface['if']}");
        } else if ($type == 'BACKUP') {
            log_msg("Carp Status is now Backup!");
            log_msg("Disabling interface: $ifkey - {$interface['if']}");
            shell_exec("/sbin/ifconfig {$interface['if']} down");
            unset($config['interfaces'][$ifkey]['enable']);
            write_config("disable interface '$ifkey' due CARP event '$type'", false);
            interface_configure(false, $ifkey, false, false);
            log_msg("Stopping DHCPD");
            shell_exec('pluginctl -s dhcpd stop');
        }
    }
}
?>


(the forum is breaking the greater than and less than in the PHP brackets at the start and end, correct them yourself)


  • This version will also manually "down" interfaces, as disabling them does not appear to fully "shut" the interface in my environment. This can cause mac flapping, and all of the issues related to that condition.
  • My version also stops the DHCP Daemon, which ensures that I only have one DHCP server running on my LAN. I need the backup device to actually become "passive". Calling dhclient may not be necessary with the interface_configure call, but it's a holdover from when I previously only used shell_exec("/sbin/ifconfig {$interface['if']} down");   to up/down the interfaces, instead of enabling/disabling the interfaces.
  • I use log_msg instead of log_error so that these events show up in the general system log as a "notice".

I do recommend creating a gateway with "Upstream Gateway" checked and a higher metric than the normal WAN gateway, as per spali's github comments to allow the backup to reach the internet via the LAN.

I also recommend disabling the "Backup" router's WAN interface - so that your secondary device will boot up with the WAN in disabled state, and the CARP script will re-enable the interface if CARP goes master. This prevents the devices from both booting up and each having active WAN interfaces.

Saw your reddit post and your most recent post on github. Thank you to you and Spali on figuring this out.
Im on 24.7.6, does the script no longer work or is the one you posted here working with .6?

I haven't set this up yet, but I have been looking into doing this for a while.

My setup:

  • opnsense main: 192.168.29.1
  • opnsense backup: 192.168.29.100
  • pfsync/halink between the two: 10.0.0.1 and 10.0.0.2
  • GPON ATT is on VLAN 842 (to bypass the need for the ATT Fiber gateway)

What should my CARP virtual IPs be for WAN and LAN?
Should I keep the backup a fully clean OPNsense state or add things like the VLAN 842 for the GPON or restore a proxmox backup so its all the same and just change the CARP settings and such??