CARP on WAN behaving weirdly...

Started by ghosterius, November 24, 2024, 11:04:00 PM

Previous topic - Next topic
CARP works on the link where the virtual address is present. Period. The protocol is designed that way. So are the two alternative protocols that are more common with closed source vendors: VRRP and HSRP.

Google the protocol definitions if you do not believe me.

Now what is the reasoning? Simple, why should there be a dedicated link simply to keep a virtual IP address active on one of two or more nodes?

Commonly the nodes are not firewalls or routers or switches but simply servers - like an HA storage cluster or as I mentioned as an example a load balancer or a Varnish cache.

You need a flat switched network on WAN if you want HA. There is no way around that. You need a flat switched network on all interfaces where you want to run CARP.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

November 30, 2024, 09:16:43 PM #31 Last Edit: November 30, 2024, 09:20:26 PM by firewallfun
Quote from: Patrick M. Hausen on November 30, 2024, 08:35:38 PM
Now what is the reasoning? Simple, why should there be a dedicated link simply to keep a virtual IP address active on one of two or more nodes?

Well, to save the planet and save power, have less network gear :) I bet there are more than me that have redundant lines from their ISP and having IP-ranges assigned to the VIP. But I guess that is mostly in the enterprise world and they don't care about these things, they just buy whatever needed and happy with that.

I do understand that in a standard, you can't go out and do other things necessarily. I'm saying this in case there are other features in OPNSense that could provide this (that I don't know of) :) On a logical level, if we look away from the limitations given by the standard, I do not understand that it isn't possible to do Active/Backup fw feature. For instance, I have a VPN firewall that pings a given GW IP. If that GW stops responding, it will activate a different WAN-network. Shouldn't be hard to program a script to do something similar, that for instance deactivates WAN totally if the returned (if any) mac-address or host-name responds to a arp, ping or other type of request in a certain expected or unexpected way. The script could continue to ping, from LAN (or other internal function), to the WAN CARP IP or the upstream GW. And activate WAN again if no ping/response.

I got reply on the other thing you asked me earlier also, basically confirming what you have said:
"Ask your ISP if on the other side of these two links there is a switch that allows the
two OPNsense boxes to communicate with each other or if you are supposed to provide your
own."

Their answer:
"This is not a viable option. While it can technically be done, it would mean that we can no longer guarantee availability. For example, one of the lines between us and you could go down without our HSRP setup detecting it, resulting in us transporting your traffic between our routers because you lack Layer 2 on your side. In the current setup, we configure the HSRP endpoints on our side to automatically withdraw internal routes in our network if the port goes down."

You do have one pair of stackable switches already, right? What's keeping you from using them with the recipe you got from Chat GPT above? It will work. No 4 ports to spare?

If I was in your situation I would do this:

- take my stackable switches and create two multi-chassis LACP trunks
- connect both OPNsense with LACP to both switches across those trunks
- run ALL interfaces on OPNsense as VLANs across those MLAG/LACP trunks including WAN
- create one access port on each switch with the "WAN" VLAN untagged and plug your ISP lines into these

Done. My own data centre OPNsense HA pair works exactly like that.

                    ┌───────────────────────────────────────────┐           
                    │                                           │           
                 ╔══│                  Switch 1                 │───────▶   
                 ║  │                                           │           
                 ║  └─────────┬───────────────────────┬─────────┘     to ISP
                 ║            │                       │                     
                 ║            │                       │                     
                 ║            │                       │                     
                 ║            │                       │                     
                 ║  ┌───────────────────┐   ┌───────────────────┐           
                 ║  │                   │   │                   │           
  whatever they  ║  │     OPNsense 1    │◀─▶│     OPNsense 2    │           
use for stacking ║  │                   │   │                   │           
                 ║  └───────────────────┘   └───────────────────┘           
                 ║            │         pfsync        │                     
                 ║            │                       │                     
                 ║            │                       │                     
                 ║            │                       │                     
                 ║  ┌─────────┴───────────────────────┴─────────┐           
                 ║  │                                           │           
                 ╚══│                  Switch 2                 │───────▶   
                    │                                           │           
                    └───────────────────────────────────────────┘     to ISP


HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

This is partly how I have it on the LAN-side of OPNsense. I have LACP-lag from both OPNsense's to the stacked switches, in total 4 spf+ cables between the switch-pair and the OPNsense lagg-pair. If I take out one switch or one OPNSense, it will not affect the network on the LAN-side. My switch doesn't support MLAG I think (Multi chassis) but I guess it doesn't matter in this case as it uses stacking, only 2 units stackable and LACP only.

https://www.fs.com/de-en/products/108710.html

I assume you would then re-use the current cables in your suggestion, so both LAN and WAN (and everything else) goes over same cables, only with VLANS separating it? That sounds effective.

If it's "stackable" and if they support an LACP bundle over two ports, one from each switch, it doesn't matter if they call it MLAG or whatever. From a redundancy point of view that is all the same. If LACP to two switches is supported, you are fine.

And yes, in my setup each OPNsense has got one cable to each of the switches - LAGG - and everything else is VLANs.

Only thing that is not: the pfsync/HA interface - dedicated interface on both firewalls and then just a cable between them.

If you want to be extra safe from cable or single interface failure, you can of course make that pfsync/HA link a LAGG of two ports, too.

At the end it all boils down to:

Can I take a sledge hammer and destroy any single box without the customers noticing? That's the design goal.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I had a fiber switch/edgeswitch laying around here and all worked from same second I put that on WAN-side, so you were right. Now the HA-works perfectly.

Also have another one I don't use, so I can split it up later, but now I can have fun! VLANs sounds a bit complicated, so was easier to get up and running this way.

Thank you for all help and suggestion with this, it is appreciated!  :)


December 08, 2024, 07:24:18 PM #36 Last Edit: December 08, 2024, 07:31:10 PM by ghosterius
Reading from all the replies here I found out that the issue seems to have been the ICMP ping being blocked on the WAN interface.

Once I was able to ping each of the nodes through the WAN interface it seems that CARP IP on WAN became stable.

Thanks for the help! Hope this helps someone else having the same issue.

Nevermind... Spoke too soon... second node still gets MASTER on the WAN.

I have no idea what could be going wrong really.

Do you have a flat network, i.e. a switch connecting your two WAN interfaces and do you use multicast for CARP? ICMP echo and CARP are completely unrelated.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

December 08, 2024, 07:58:23 PM #39 Last Edit: December 08, 2024, 08:10:00 PM by ghosterius
WAN interfaces have direct connection to the ISP router (which has 4 ports and I'm currently using 2 - one per each OPNSense).
I am using multicast, yes.

I still get the BACKUP -> MASTER (master timed out) after a while on that WAN CARP IP...