CARP connections to switches in MLAG

Started by Haddock27, January 06, 2024, 06:21:53 AM

Previous topic - Next topic
January 06, 2024, 06:21:53 AM Last Edit: January 06, 2024, 10:22:39 AM by Haddock27
Here is my setup:

                       VIP (WAN)
                            |
         ------------------------------
         |                                     |
         |                                     |
  -------------     pfSync     -------------
| Firewall 1 | <---------> | Firewall 2 |
  -------------                    -------------
         |                                     |
         |                                     |
         |                                     |
  ------------       MLAG       ------------
| Switch 1 | <-----------> | Switch 2 |
  ------------                      ------------
         |                                     |
         ------------------------------
                            |
                      LACP (LAN)

I have 2 firewalls in a CARP group. Each firewall connects to a Mikrotik switch and these switches are in an MLAG configuration with their onward ports bonded in LACP.

Question: How should the switch ports to the firewalls be configured?

MLAG allows the switches to appear to external hardware as though they are connected to a single switch. Bonds are created across the physical switches (i.e. an LACP bond will have 1 port of the bond on each switch so that an entire switch can go down and the link survive). How should the ports connected to the firewalls be bonded? Clearly LACP is not correct because the firewalls will not negotiate this correctly on their end as the firewalls act independently of each other. My other options are:
- active-backup
- broadcast

Am I correct in assuming that if Switch 1 were to go down that Firewall 1 would detect a dead connection and demote itself and so Firewall 2 take over?

Quote from: Haddock27 on January 06, 2024, 06:21:53 AM

Am I correct in assuming that if Switch 1 were to go down that Firewall 1 would detect a dead connection and demote itself and so Firewall 2 take over?

I have a very similar setup to yours and that is how mine behaves. I did not check the "disable preempt" option under High Availability Settings.

On my Unifi switches (which do not support advanced bonding functions), I just configured Switch 1 to have a higher priority over Switch 2 (higher priority on Unifi means settings the actual priority numerical of Switch 1 to a lower value than that of Switch 2). And for anyone else with Unifi switches - I had to set the CARP frequency in OPNsense to 2 instead of 1 for it to be stable.

Edit: Just to add clarification, what I mean to say is that I didn't need to configure any LAG settings on my switches for it to work in my HA configuration which is identical to yours (except I have Unifi switches and Multi-WAN).

You can bond over both members, so FW1 igb0 on Sw1 and FW1 igb1 on Sw2. Then you are fully redundant. I would not change the defaults of preemtion checkbox

Hi mimugmail, I am not sure what you mean by "bond over both members". I don't understand how to get the switch LAGG to play with CARP. Here are the issues as I see them:

Active-backup
The challenge here is ensuring that the switch's active link aligns with the CARP master firewall. Without an automatic mechanism to align the active switch port with the CARP master, this setup could lead to misalignment where the active switch port is connected to the standby firewall.

Broadcast
This guarantees that the active CARP firewall always receives the traffic, regardless of which one is master, but if a firewall leaves the CARP group (say because CARP is disabled for maintenance) it will start handling duplicate traffic.

@Haddock27 in case multi chassis LACP is available I prefer to connect each firewall with two interfaces, one to each switch, and configure them as lagg/LACP.

If you prefer a single link to each OPNsense, nothing special can be configured on the switches. The OPNsense cluster does not support multi chassis bonding/lagg.

HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

When you run LACP both ports are active, so CARP and LACP has no relation, just be sure to disable igmp snooping on the switches/vlans, rest will just work

January 08, 2024, 10:06:15 AM #6 Last Edit: January 08, 2024, 10:09:47 AM by Haddock27
Thanks Patrick and mimugmail. I think there may be a misunderstanding about my question. I am asking about the configuration on the switch. I had thought that the switch (or in my case switches in MLAG) should have no bonding between the ports that connect from the switch to the firewalls because I was under the impression that when a new CARP group master takes over it issues a Gratuitous ARP (GARP). This would mean that, in principle, the switch should see the GARP and update its MAC address table. However, this does not seem to be working - hence my question. For me, when the CARP master changes traffic gets dropped.

Quote from: Haddock27 on January 08, 2024, 10:06:15 AM
Thanks Patrick and mimugmail. I think there may be a misunderstanding about my question. I am asking about the configuration on the switch. I had thought that the switch (or in my case switches in MLAG) should have no bonding between the ports that connect from the switch to the firewalls because I was under the impression that when a new CARP group master takes over it issues a Gratuitous ARP (GARP).
Correct. A CARP cluster is not LACP so no bonding here.

Quote from: Haddock27 on January 08, 2024, 10:06:15 AM
This would mean that, in principle, the switch should see the GARP and update its MAC address table. However, this does not seem to be working - hence my question. For me, when the CARP master changes traffic gets dropped.
That's odd and needs further investigation.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

My experience is that many switches do not handle CARP / VRRP correctly across "stacked" switches.

You can tell if you've got that problem, because, you get all sorts of weird  CARP issues occurring such as:

  • MASTER or BACKUP firewall failing and disabling CARP due a a CARP problem
  • CARP won't fail back
  • CARP gets a huge demotion level number of 240 or 480 or higher
  • Sometimes it works and sometimes it doesn't and you just can't figure out why
  • You reboot the master or backup firewall and strange CARP issues occur
  • Both the master and backup firewall report themselves as the CARP master (which should be impossible)

Check here to see the CARP demotion level number:
   Interfaces: Virtual IPs: Status

Ask the switch manufacturer if they properly support VRRP which is basically what CARP is. If you tell them your trying to run CARP they will say "What?????" and won't know what your talking about.

Even then, what you want is a switch stack that supports external VRRP / CARP, that is, not that the switch stack itself supports VRRP, but, that is supports another VRRP plugged into it. In our case firewall1 and firewall2 but it could be server1 and server2 share an IP address using VRRP (which is essentially CARP).

If they do support external VRRP then CARP will work.

Often I have found it simpler to have:

     Firewall1 > switch1
     Firewall2 > switch2
     Switch1 and switch2 cross connect as standalone switches and thus CARP behaves nicely.


Why so much trouble with stacked switches?
Because it's actually really complicated for the switch manufacturer. They have to maintain an ARP states across 2 switches and replicate certain things switch to switch and the whole time lie and pretend to be a single switch. CARP/VRRP complicates things because of the technical way it shares an IP address across two devices and the fact that VRRP / CARP are protocol 112 and not UDP (protocol number 17) and not TCP (protocol number 6).

I have found they work hard on getting stacked switches to work covering most stuff, but not things like VRRP / CARP.

I know I haven't cured the problem but hopefully my explanation helps it all make sense as to why sometimes you just can't get it to go.



July 25, 2024, 12:41:40 PM #9 Last Edit: July 25, 2024, 12:45:16 PM by itngo
Quote from: crlt on January 07, 2024, 04:36:55 AM
I have a very similar setup to yours and that is how mine behaves. I did not check the "disable preempt" option under High Availability Settings.

On my Unifi switches (which do not support advanced bonding functions), I just configured Switch 1 to have a higher priority over Switch 2 (higher priority on Unifi means settings the actual priority numerical of Switch 1 to a lower value than that of Switch 2). And for anyone else with Unifi switches - I had to set the CARP frequency in OPNsense to 2 instead of 1 for it to be stable.

Edit: Just to add clarification, what I mean to say is that I didn't need to configure any LAG settings on my switches for it to work in my HA configuration which is identical to yours (except I have Unifi switches and Multi-WAN).


Many thx for confirming that chaning frequency from 1 to 2 helped in your setup with Unifi Switches. We also have a customer here with 2 opnsense and Unifi-Switches... It was not stable, but today we changed frequency to "3" and it looks far more promissing now....

Then I read you post, and feeled confirmed that we are now on the right way!

Thank you so much! five STARS!