migrating from single FW/router to HA setup

Started by tessus, March 23, 2023, 10:49:23 PM

Previous topic - Next topic
My current setup is rather simple.

Managed Switch -> LAN (OPNsense box) WAN -> modem (bridged mode)

I went through the documentation and I believe I understood the idea of using CARP. However, there are 2 questions I couldn't find the answers to:

1. I should be able to use my managed switch as the WAN switch, since I am using VLANs. But I don't know how to still use the modem in bridged mode. Currently the modem is connected to my OPNsense FW's WAN port and this interface gets the WAN IP address (via DHCP). IMO, I would have to put the modem in router mode again and try to disable the firewall features. The problem is I can't reacall whether this is even possible. I certainly do not want to configure firewall rules on OPNsense and on the modem. But am I correct that I won't be able to use the modem in bridged mode anymore?

2. How do I apply my current config to the HA cluster? Currently I am using as interfaces LAN, WAN, about 16 VLANs (parent is LAN), 2 Wireguard interfaces (wgX) and one OpenVPN (ovpncX). My firewall rules, DHCP static entries, and other settings are quite extensive. I do not want to create everything from scratch.

Such an HA cluster is certainly nice and easy to setup had one done that from the beginning. But I couldn't find any guide how to migrate a single FW instance to an HA cluster. By migrating I mean that I do not have to start from scratch. ;-)

Has anyone ever done such a migration and has some tips and tricks up their sleeve?

June 13, 2023, 09:13:51 AM #1 Last Edit: June 14, 2023, 02:27:53 PM by wstemb
I cannot answer you on the first question.

About the adding another node to a highly defined cluster without defining it from scratch, the first part (network and firewall topology) is possible:
1. You have to build another node with exact copy of interfaces as on first (exact means exact OPTx assignment, since OPTX definitions are used during the synchronization phase (copying the rules to second node).
2. Defining a new set of IP address on every pair of interfaces, defining CARP VIP on all interfaces with the IP address previously used on the single firewall interfaces (so yiu do not have to change Default gateways on the network nodes. 
3. Defining the High Availability on main and second node, and defining all the synchronization (XMLRPC Sync) you need. This will copy the chosen definitions  to the second node.

The guide https://www.thomas-krenn.com/en/wiki/OPNsense_HA_Cluster_configuration  is enough for this phase, if you extrapolate it to a more complex situation and if you maintain the OPTx order of interfaces.

I am working now on porting the OpenVPN to the cluster, so I cannot add anything on this.

Thanks for the reply. I appreciate it.

Quote from: wstemb on June 13, 2023, 09:13:51 AM
1. You have to build another node with exact copy of interfaces as on first (exact means exact OPTx assignment, since OPTX definitions are used during the synchronization phase (copying the rules to second node).

I have no way to assign the interface IDs myself, since they are chosen automatically when creating an interface. There is no way to do that manually.

Unless a restore keeps the same assignments, this is impossible. Otherwise a backup and restore should do the trick.

Quote from: wstemb on June 13, 2023, 09:13:51 AM
2. Defining a new set of IP address on every pair of interfaces, defining CARP VIP on all interfaces with the IP address previously used on the single firewall interfaces (so yiu do not have to change Default gateways on the network nodes.

Here lies the issue. I have N (about 25) VLANs. This means I have to change 2xN interfaces and create N CARP VIP entries.

Then I have to change all firewall rules, because the FW now has to use the virtual interfaces, which are using new interface IDs.

I also use OpenVPN (out) and Wireguard (in/out). I certainly would have to figure out how to make this work as well.

Quote from: wstemb on June 13, 2023, 09:13:51 AM
3. Defining the High Availability on main and second node, and defining all the synchronization (XMLRPC Sync) you need. This will copy the chosen definitions  to the second node.

Yes, this should not be too complicated.

Thanks for the link, but I actually had read that one before I posted this topic.

Unfortunately all this is a moot conversation unless there is an answer to my first question.
I can't be the only one who has a cable modem, can I? Additionally, anyone who uses OPNsense is most likely using the modem in bridged mode, so someone should have an answer to my question.

it can be done and I done it (IPv4 only) and it is was a smooth, straightforward few hours manual work.

I have 6 real interfaces (including PFSYNC) and 13 VLAN-s on some of real interfaces on every node.  The firewall (which will become master) was in production for few months. I had to work "in place", since I was missing the third machine.

First, I made a IP address plan - 3 addresses per interface. The address on the "old", existing firewall have to become VIP addresses, other two are for nodes.

I manually reconstructed the interfaces on the new firewall (identical machine as the MASTER) , first the real ones, after that the VLANs, just following the order  of OPTxx interfaces. Where I had the gap in the numbering of OPT interfaces (just one, luckily) I defined one "placeholder", defined the next, after that I deleted the placeholder - few hour of non intrusive work, can be done whenever you want.  I had also to define manually the Virtual IP, OTHER type definitions on new firewall, since during the test I did not see copying them (just few of them, so it was easier to define them than solve the issue)

After that, I defined the VIPS one by one, changing the Master ipv4 IP to one reserved to the node, and moving  the old address to VIP. After that, I synchronized the backup with the master. I had no need to change any rule or NAT definition, Just the OpenVPN server interface address.

All work was done in two evenings, in the maintenance time window,  first day the backup switch trunk and VLAN definitions, IP address planing, testing, basic functionality and main interfaces, the second all the remaining.  In the meantime, the Backup node was disabled. All the time, on every step I made backups of configurations of both firewalls, to step back if needed.

I am working now on two last functions: OpenVPN client access (using internal CA :-( ), and FRR.

Probably there is a better way, but I had to do the work, I had deadlines.  So I done it manually this way, knowing that "The Better is the Enemy of the Good".

June 14, 2023, 05:06:08 PM #4 Last Edit: June 14, 2023, 05:30:52 PM by wstemb
Quote from: tessus on June 14, 2023, 05:00:52 AM

I have no way to assign the interface IDs myself, since they are chosen automatically when creating an interface. There is no way to do that manually.

It is possible, a boring process of manually defining interfaces on second firewall one by one following the order OPT1 -> OPT23.  But it can be done in less than an hour on this number of interfaces.
Quote

Unless a restore keeps the same assignments, this is impossible. Otherwise a backup and restore should do the trick.

You can try it, backup the main, edit the xml (IP addresses and so on), restore on backup (new)
Quote

Here lies the issue. I have N (about 25) VLANs. This means I have to change 2xN interfaces and create N CARP VIP entries.

You have to define a new IP address on every interface on backup node (something that has to be done in any case), replace a IP address on every interface on main, define with previously used address new CARP VIPs on both nodes. So 4 actions per interface. Manual boring work again, but it can be done relatively fast.  Until you have the second firewall disabled, it can be done sequentially, in phases.

But, maybe it can be done, at least partially, editing the backup config xml file and restoring on main, combining part of interface config from  the main into the backup node config xml and restoring on backup. I preferred the manual work, where all was under control. 
Quote

Then I have to change all firewall rules, because the FW now has to use the virtual interfaces, which are using new interface IDs.

No, I did not touch any rules after building the HA, neither on main, neither on synchronized rules on backup. All was  working if you maintain cluster IP addresses = former FW addresses.  I had to change only the OpenVPN server Interface from WAN to the VIP CARP address of the WAN.
Quote

I also use OpenVPN (out) and Wireguard (in/out). I certainly would have to figure out how to make this work as well.

For OpenVPN in client access mode I will tell you later, I defined everything, services are working, but I have to check if failover is working (in some maintenance window time)
Quote
Quote from: wstemb on June 13, 2023, 09:13:51 AM
3. Defining the High Availability on main and second node, and defining all the synchronization (XMLRPC Sync) you need. This will copy the chosen definitions  to the second node.

Yes, this should not be too complicated.

Thanks for the link, but I actually had read that one before I posted this topic.

Unfortunately all this is a moot conversation unless there is an answer to my first question.
I can't be the only one who has a cable modem, can I? Additionally, anyone who uses OPNsense is most likely using the modem in bridged mode, so someone should have an answer to my question.

I have a simplest routing scenario on WAN with a standalone managed switch connecting the cluster nodes and the ISP router, using a small IP segment on WAN side and fixed IP for all nodes on this segment.

Can you specify better how the WAN definition is configured in bridge mode? I have not experience with cable modems, I had to use several years ago a ISP ADSL bridge/router configured as bridge, moved ASAP to router definition...


Quote from: wstemb on June 14, 2023, 10:31:42 AM
. . .
All work was done in two evenings, in the maintenance time window,  first day the backup switch trunk and VLAN definitions, IP address planing, testing, basic functionality and main interfaces, the second all the remaining.  In the meantime, the Backup node was disabled. All the time, on every step I made backups of configurations of both firewalls, to step back if needed.

I am working now on two last functions: OpenVPN client access (using internal CA :-( ), and FRR.
. . .

Did you find a solution for the OpenVPN client access failover ?

I'm in the planning state of a new HA setup (adding 2nd node to existing master)  and found your description very helpfull.