HA and Lagg setup / best practices for slightly different hardware

Started by fred9954, December 04, 2023, 05:09:05 PM

Previous topic - Next topic
Hi,

I'm currently using pfSense, but planning to migrate to OPNsense, with a HA setup.

Recently on pfSense they removed to need to have the exact same OS network interface name (talking about ixl / igb / ...) => now it seems pfSense only needs to share the same configured interface name (wan, lan, ..) to be able to sync states

Is that kind of feature also planned on OPNsense, or maybe even already present? (sometimes the wiki is not up to date, so I prefer asking)

If not, I'm totally fine using LAGG as a workaround as my hardware is slightly different between the 2 nodes (plus it's more convenient in case of a failure, to be able to replace with temporary hardware)

In that case :
- What type of LAGG (Failover / LACP / ...) should be used as in my config there would be only 1 link in each group?
- Is it possible to use custom Lagg group names, to have something more explicit for example lagg_wan / lagg_lan, ... instead of lagg0 / lagg1 / ... ?

Thank you!

When syncing states is a strict requirement, you should really use the same hardware.

I have played around a lot with state synchronization from hardware to vm and also between two hardwares that are different, and it always resulted in higher maintanance needed (states getting stuck, sessions being open or closed instead of the other way around, ipsec connecting but no traffic...)

If seamless failover is really a requirement, shouldn't there be a budget for it?

Otherwise, HA runs just fine with only CARP and xmlrpc sync. PFsync is really only needed if sessions shouldn't get reset. But even with a failover without PFsync nobody really notices much because new sessions are quickly re-established these days.
Hardware:
DEC740

Thank you!

I totally understand your point about using exactly the same hardware, of course that's the easiest way. But in real life it's not always so easy, as the hardware has a long life for such devices. We still have a Xeon D-1541 Supermicro (5018D), and recently bought 2x Supermicro AS-5019D-FNT4 with Embedded Epyc 3251 for this setup

So right now the hardware is strictly identical, but my idea is to see further, in maybe 2, 3 or 5 years, but also in case of hardware failure... And it means it could be nice to be able to replace a 5019D with the spare 5018D, even temporarily

That's why I would like to use LAGGs (but with single physical link) even if the hardware can be the same at the moment.

Is there any drawback (in performance / functionality / maintenance), and what type of LAGG should we use? Is failover fine with single link?

On my Hardware with single uplink, I use this configuration to connect to a Netgear switch:

Interfaces: Other Types: LAGG
Device: lagg0
Parent: ax0
Proto: lacp
Fast timeout: yes
Use flowid: default
Hash Layers: L3
use strict: default
MTU:
Description: lagg0

And on the Netgear switch its configured the same way. The most important thing is that "Fast timeout" and the "Hash Layer" has to be configured the same on the switch. Also to prevent CARP flapping back and forth between master and backup firewall, disabling "Fast timeout" for lagg might be a good option, since then a link failure will always take 30 seconds to recover. The CARP broadcasts usually come once per second. That gives the firewalls enough time to recover from a failover in case of link flapping.
Hardware:
DEC740

Yes using LACP is an option of course (but switch dependent as you mentioned), I was more looking for an option being independent from the switch, that's why I was talking about LAGG "Failover" option

In my case the LAGG usage is just a workaround for having the same interface names across the devices (for state Sync to work), there would be only 1 link in each LAGG group, it means there cannot be any aggregation, load-balancing, redundancy on this single link... as anyway the CARP would detect a failure and switch the whole device.

By the way when using CARP on a whole device (multiple CARP IPs for Wan / Lan / Dmz / Opt / ...), is it possible to group / sync the CARP failure detection on all those IP addresses for them to switch at the same time? Or maybe it's configured as is by default?

The IP addresses CARP state changes completely to the backup even if only one of them becomes unreachable. There's a global demote level and the one with the highest demote level always becomes the backup for all CARP VIPs. And the one with the lowest demote level always bacomes master for all CARP VIPs.

I don't know much about switch independant LAGG so I can't help there, sorry.

As shown above I use LACP with a single link though, that's a totally acceptable configuration.
Hardware:
DEC740