CARP with DHCP on WAN

Started by bubbagump, January 18, 2021, 11:13:00 PM

Previous topic - Next topic
This seems to be a pretty common topic, but I haven't found anything definitive. I have a DHCP address on my WAN. I have seen multiple work arounds involving spoofing MACs, using non-routable IPs on the WAN interface for CARP and others. It seems to me that simply doing an ifdown on the WAN interface of the backup firewall is fine for my use case.

The big question is, where should I create my notify logic? Can I do it directly in /usr/local/etc/devd/carp.conf or will that get overwritten with updates? Can I create another file /usr/local/etc/devd/mycustomtweaks.conf that will be safe from updates?

WAN with DHCP and CARP is no fun.
I usually let a modem to the dialin and OPNsense behind with static IPs

I guess the plan is to have stateful failover on DHCP WAN?

Please update the thread if you find any good solutions as I would like to have the same.
Currently I just keep my WAN interfaces without CARP so when a failover occurs it drops all external sessions but at least I still have Internet access.
2x 23.7 VMs & CARP, 4x 2.1GHz, 8GB
Cisco L3 switch, ESXi, VDS, vmxnet3
DoT, Chrony, HAProxy + NAXSI, Suricata
VPN: IPSec, OpenVPN, Wireguard
MultiWAN: Fiber 500/500Mbit dual stack + 4G failover

--
Available for private support.
Did my answer help you? Feel free to click [applaud] to the left

January 19, 2021, 11:01:26 PM #3 Last Edit: January 19, 2021, 11:33:13 PM by bubbagump
Quote from: sorano on January 19, 2021, 05:06:05 PM
I guess the plan is to have stateful failover on DHCP WAN?

Please update the thread if you find any good solutions as I would like to have the same.
Currently I just keep my WAN interfaces without CARP so when a failover occurs it drops all external sessions but at least I still have Internet access.

The plan is if the firewall is BACKUP then 'ifdown vtnet0' which is my WAN interface. If the firewall is MASTER then 'ifup vtnet0'. I don't expect this to be stateful nor do I plan to have CARP VIPs on the WAN interface. I simply want to use the CARP state to trigger an interface change.

It actually sounds like you are doing what I am after. How are you achieving that? For instance, just in basic testing on my BACKUP, if I run 'ifconfig vtnet0 down' all interfaces go down and 'ifconfig vtnet0 up' brings all interfaces up. It's bizarre.

Quote from: bubbagump on January 19, 2021, 11:01:26 PM
It actually sounds like you are doing what I am after. How are you achieving that? For instance, just in basic testing on my BACKUP, if I run 'ifconfig vtnet0 down' all interfaces go down and 'ifconfig vtnet0 up' brings all interfaces up. It's bizarre.

I run CARP on all interfaces except for WAN. The WAN interface on each firewall is just configured like "normal" with DHCP.

So the gateway for clients is the CARP LAN IP, and outbound traffic goes out via the WAN of the current CARP master.
2x 23.7 VMs & CARP, 4x 2.1GHz, 8GB
Cisco L3 switch, ESXi, VDS, vmxnet3
DoT, Chrony, HAProxy + NAXSI, Suricata
VPN: IPSec, OpenVPN, Wireguard
MultiWAN: Fiber 500/500Mbit dual stack + 4G failover

--
Available for private support.
Did my answer help you? Feel free to click [applaud] to the left

September 25, 2021, 04:01:13 AM #5 Last Edit: September 25, 2021, 02:26:06 PM by notrox
Quote from: sorano on January 26, 2021, 03:21:15 PM
Quote from: bubbagump on January 19, 2021, 11:01:26 PM
It actually sounds like you are doing what I am after. How are you achieving that? For instance, just in basic testing on my BACKUP, if I run 'ifconfig vtnet0 down' all interfaces go down and 'ifconfig vtnet0 up' brings all interfaces up. It's bizarre.

I run CARP on all interfaces except for WAN. The WAN interface on each firewall is just configured like "normal" with DHCP.

So the gateway for clients is the CARP LAN IP, and outbound traffic goes out via the WAN of the current CARP master.

I just setup a second OPNsense firewall in my VMware 7 environment. When I have the WAN interface active on the secondary firewall with the same DHCP lease as my primary firewall I experience packet loss across the WAN interface.

I do not have CARP on my WAN interface. It's configured like "normal" as you described with DHCP.

What do you mean with "the same DHCP lease as my primary firewall" ?

Obviously you cannot have the same public IP on two different hosts else you are going to have a bad time.
2x 23.7 VMs & CARP, 4x 2.1GHz, 8GB
Cisco L3 switch, ESXi, VDS, vmxnet3
DoT, Chrony, HAProxy + NAXSI, Suricata
VPN: IPSec, OpenVPN, Wireguard
MultiWAN: Fiber 500/500Mbit dual stack + 4G failover

--
Available for private support.
Did my answer help you? Feel free to click [applaud] to the left

If you have both firewalls on DHCP, I assume only one of them gets the lease?
Assuming that is so, the second one probably has no internet access, so how do you update it and things like that?

Quote from: bimbar on October 08, 2021, 10:43:45 AM
If you have both firewalls on DHCP, I assume only one of them gets the lease?
Assuming that is so, the second one probably has no internet access, so how do you update it and things like that?

That depends on your ISP. Where I live most ISP's provides more than 1 IP.
2x 23.7 VMs & CARP, 4x 2.1GHz, 8GB
Cisco L3 switch, ESXi, VDS, vmxnet3
DoT, Chrony, HAProxy + NAXSI, Suricata
VPN: IPSec, OpenVPN, Wireguard
MultiWAN: Fiber 500/500Mbit dual stack + 4G failover

--
Available for private support.
Did my answer help you? Feel free to click [applaud] to the left

November 01, 2021, 10:51:17 PM #9 Last Edit: November 03, 2021, 12:35:26 AM by learnedbyerror
EDIT2: I misunderstood the use of pre-empt.  As I now read it, pre-empt will address keeping all interfaces in a consistent state.  More testing!

EDIT:  I have done some additional digging and found that a script placed in /usr/local/rc.syshook.d/carp/ will be called when a carp event occurs.  I have play around with this and now have something that works in the case that all 3 CARP interfaces on the primary go down - i.e. power failure; however, if there is a problem that say affects only  the WAN interface, then the LAN interface is still pointing to the primary.  More reading and testing needed :) lbe 11/02/2021

Has anyone found a hack that facilitates the OP request?  Like the OP, I am fine with losing state.  I would like to use the HA to keep everything else synced and just have a poor boy solution that will bring up the WAN interface (vtnet1) configured with an LAA MAC shared between the two firewalls in DHCP mode and then taking the WAN interface down when the primary is back in service.

I'm still too new to OPNsense (and HardenedBSD) to know how to implement the event detection and action.  I do have many years of experience in Linux and other Unices and am glad to take a shot at writing the control scripts if someone know what hooks/APIs to use.

Thanks!

lbe

I have made a WIP script for WAN with single DHCP lease (only LAN setup as CARP).
I didn't switch to production with it yet, but testers with feedback are welcome.
at least some synthetic test cases did work as expected. A forced switch with Maintenance Mode is almost immediate... no ping lost. The only thing, that took a couple of seconds was when I shutdown the master. There the switch takes a bit longer but acceptable for me.

https://gist.github.com/spali/2da4f23e488219504b2ada12ac59a7dc


January 03, 2022, 01:34:36 AM #11 Last Edit: January 03, 2022, 03:28:26 PM by bitcore
Oh how funny! I've been working on this independently over the past few days and didn't check this thread to see you've solved it a couple of days ago!
We've effectively arrived on the same method to achieve this. Except your calls, Spali, are probably much better since you are using the config system's normal calls (which I'm not familiar with. I'm instead smashing in console commands via exec, equivalent to using a hammer. (unsanitized code execution risks here!)

Anyway, I also disable DHCPD on the passive/backup device (so I don't have two DHCP servers on my LAN) and make a call to the dhcp client to request a new lease on the WAN interface. I think we could also enumerate "wan*" interfaces to facilitate environments with multi-wan.


For reference, I create the following as this file on both devices: usr/local/etc/rc.syshook.d/carp/50-DHCP
and then "chmod +x 50-DHCP"
#!/usr/local/bin/php
<?phprequire_once("config.inc");require_once("interfaces.inc");require_once("util.inc");$subsystem = !empty($argv[1]) ? $argv[1] : '';$type = !empty($argv[2]) ? $argv[2] : '';if ($type != 'MASTER' && $type != 'BACKUP') {    log_error("Carp '$type' event unknown from source '{$subsystem}'");    exit(1);}if (!strstr($subsystem, '@')) {    log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");    exit(1);}foreach($config['interfaces'] as $ifkey => $interface) {	if ($ifkey=='wan') { // could change this to match on wan* interfaces for multi-wan setups, maybe?		if ($type == 'BACKUP') {			log_error("Carp Status is now Backup!");			log_error("Shutting interface: {$interface['if']}");			shell_exec("/sbin/ifconfig {$interface['if']} down");			log_error("Stopping DHCPD");			shell_exec('pluginctl -s dhcpd stop');		} else if ($type == 'MASTER') {			log_error("Carp Status is now Master!");			log_error("Starting interface: {$interface['if']}");			shell_exec("/sbin/ifconfig {$interface['if']} up");			log_error("Restarting DHCPD");			shell_exec('pluginctl -s dhcpd restart');			shell_exec("dhclient {$interface['if']}");		}	}}?>



For future reference in case Spali's github post ever disappears, they are doing the following instead of my foreach statement:

$ifkey = 'wan';
if ($type === "MASTER") {
    log_error("enable interface '$ifkey' due CARP event '$type'");
    $config['interfaces'][$ifkey]['enable'] = '1';
    write_config("enable interface '$ifkey' due CARP event '$type'", false);
    interface_configure(false, $ifkey, false, false);
} else {
    log_error("disable interface '$ifkey' due CARP event '$type'");
    unset($config['interfaces'][$ifkey]['enable']);
    write_config("disable interface '$ifkey' due CARP event '$type'", false);
    interface_configure(false, $ifkey, false, false);
}



Edit: For those who may be looking for a DIY on how to enable this, I have a small write-up on the opnsense subreddit, here:  https://old.reddit.com/r/opnsense/comments/runb4r/diy_ha_activepassive_for_home_internet/

I am trying to do this on Dual WAN using Spali's script and the primary kicks but the secondary WAN just sits there.

bitcore's solution works, though I don't know if we need to kill the dhcp server on the backup.. if it all works correct, dhcp should failover to the backup when the primary fails.. if you sync all leases, the backup should take over as dhcp server. 

If anyone sees this before I figure it out.. how can I tweak Spali's script to kick both WAN interfaces when there is a failure? 

Quote from: DocGonzo74 on January 16, 2022, 04:37:28 PM
I am trying to do this on Dual WAN using Spali's script and the primary kicks but the secondary WAN just sits there.

bitcore's solution works, though I don't know if we need to kill the dhcp server on the backup.. if it all works correct, dhcp should failover to the backup when the primary fails.. if you sync all leases, the backup should take over as dhcp server. 

If anyone sees this before I figure it out.. how can I tweak Spali's script to kick both WAN interfaces when there is a failure?
regarding DHCP, currently not tested, but according to docs and setup DHCP synced with failover defined, I assume this should work on the LAN side.

also replied in the gist for the other question.
But here too:

Assuming you just want to disable both WAN interfaces on the backup and enable both on the master, you can just duplicate the script with a any suffix in the filename and adjust the $ifkey variable to for the second WAN interface.

A bit cleaner solution would be adapt the script to allow to define an array for $ifkey variable to it can loop over the interfaces.

Quote from: bitcore on January 03, 2022, 01:34:36 AM
We've effectively arrived on the same method to achieve this. Except your calls, Spali, are probably much better since you are using the config system's normal calls (which I'm not familiar with. I'm instead smashing in console commands via exec, equivalent to using a hammer. (unsanitized code execution risks here!)
If it works for you, then I you've done a good job  ;)
I started as you, but had also the problem with the WAN lease not working etc. And I just decided instead of manually issue a renew, to issue the disabling of the WAN interface over the configuration (same as you would untick "enabled" in the interface GUI) to allow OPNsense to reconfigure everything as it would also do it when manually disabled or enabled. That is also responsible to get the DHCP lease during enable keeps all other stuff up to date. Just thought, it would be less error prone, but I don't like that it probably makes a lot more than required.
I think your version works for what it's made needed adaption for other use cases. Mine does more, but probably to much.
So people can choose what they want and that is good as it is :D