Hi Everyone!
So, my setup is as follows:
2 OPNSense virtualized on Proxmox with 1 vNIC and 2 physical host NICs assigned.
The vNIC is trunked and has multiple vLANs crossing it, no issues there, everything's working wonderfully (CARP and the lot work fine there).
Then there's 1 NIC dedicated to the WAN connection (and this is the one's acting a bit tricky... more in a sec) and 1 NIC dedicated to CARP between the 2 VMs.
CARP's vIP configured for the internal LAN networks (multiple vLANs) and everything's sort of alright... but the WAN connection is just acting weirdly.
Whenever I enable CARP on the backup machine, all vIPs get on BACKUP mode, but a few second (minutes) later, WAN gets into MASTER, while on the main Firewall, it's also at MASTER!
I've checked physical cables, I've checked firewall status and logs but nothing comes up really as being blocked at any point...
attached the images of the configs.
On the log I see this:
2024-11-24T19:26:56 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member " (192.168.17.2) (2@igb0)" has resumed the state "MASTER" for vhid 2
2024-11-24T19:26:56 Notice kernel <6>carp: 2@igb0: BACKUP -> MASTER (master timed out)
2024-11-24T19:22:35 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure crl (execute task : openvpn_refresh_crls(1))
2024-11-24T19:22:35 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure crl (execute task : core_trust_crl(1))
2024-11-24T19:22:35 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure crl (1)
2024-11-24T19:22:35 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure crl (execute task : openvpn_refresh_crls(1))
2024-11-24T19:22:35 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member " (192.168.18.2) (10@vtnet0_vlan10)" has resumed the state "BACKUP" for vhid 10
An interesting aspect is that I don't even have OpenVPN configured, so I don't know wth openvpn wants with the lot but... OK...
I must admit I am lost... I don't know why this is happening and why it doesn't "see" that the other node has the WAN vIP up!
As a final point on the architecture explanation, in front of the 2 FW there's an ISP router which works absolutely fine and it has been working for years without a problem on the other *Sense firewall software.
Did you ever find out of this?
I have an even simpler setup with two physical boxes and dedicated cable (cross-over) between the sync-interfaces. No vlans. I'm new to opnsense, used pfSense before and struggle with this here...
I can see traffic over sync-interface, but it still activates both CARP VIP, so it causes conflict when both as set to master. And the backup-unit is the one working if anything working. It is so strange. I saw some videos explaining the setup and even there they got weird issues.
CARP does not happen over the sync interface but on each CARP interface individually.
How exactly did you configure the CARP interface(s)?
Yes, that's true I guess. More the communication behind it that activates it I think. So that only one VIP is active to avoid conflict. And to be sure of that, the sync interface has to work properly.
I have the interfaces like attached in the images.. Since I have public IPs on both WAN (/29) transport network and LAN (/24) , I have censored part of the IPs that are public.
Needed another post to be allowed to upload last image I had :)
Why are you using unicast instead of multicast? Also I recommend using the fitting /29 prefix length for both fixed IP addresses but /32 for the CARP VIP.
Well, I watched a lot of guides recommend it, also documentation seemed to favour unicast-method when on a dedicated interface. No need to do discovery on each request. But who knows. I can remove it. It is a dedicated cross over cable directly from port to port, so it shouldn't matter a lot. Is it safe to have unicast on the WAN-interface?
I did a completely new install and fixed lagg issue I had on LAN.
Btw, I have WAN CARP VIP kind of working now. I can sync the rules over. And if I take one of the firewalls offline, the WAN CARP IP replies perfectly, just a single missed ping. And everything works 100% when the backup opnsense fw is under reboot. Only visible issue and that is a big one - is that when I log in to the CARP WAN IP when both servers are up, I get to the backup unit. I only get to the master when the backup-unit is down for reboot. Then I get the master on the CARP WAN IP.
When checking the VIP-status when both FW are on, the WAN shows correctly MASTER on both my CARP WAN and CARP LAN IP on the Master-unit. Like it should.
But the bad is that backup-unit shows MASTER on CARP WAN IP as well. CARP LAN IP behaves correctly.
So you can say that CARP LAN-IP works on both as it should. But the CARP WAN is active at both places at the same time, I think it creates some issues :( Before I had both of them wrong - both WAN and LAN - so progress :) I have tried many reboots and trying to activate/deactivate the carp, but it doesn't seem to change.
PS: I removed just now the peer IP to use multicast (I think). Did it all 4 places (WAN+LAN). Just leaving it empty like that I guess?
I can sync between everything just as with IP. So it is only this master/backup on VIP that is the issue. The pfsync interface workning just fine.
Note that if I click Persistent CARP maintance mode on - on the backup one - it reduces the demotion level to 240, but it will still show status MASTER on the WAN CARP IP. If I click Temporary disable CARP, it shows Backup on both CARP IPs Wan and Lan two seconds before Wan becomes Master again (and master on both opensense-bokses).
I wonder if there is some outbound NAT things I have to do to fix this?
The peering on the HA sync interface and CARP are in no way connected. CARP works completely isolated. You can have two FreeBSD machines with identically configured (e.g.) varnish proxies and set up CARP in the publicly accessible interface manually. No sweat.
CARP state is negotiated for each interface directly on that interface.
The HA sync synchronises firewall state and configuration but not CARP.
So the sync interface aside
- configure a static IP address on both nodes on all interfaces where you want CARP
- configure a CARP VIP on that interface with /32 netmask on the master and sync that configuration to the backup
HTH,
Patrick
But how can the master/backup work for only the LAN in this case? The primary VIP (carp) on the WAN isn't supposed to be active on both devices at the same time?
Because the WAN is in some way configured wrong. Simple as that ;)
Every interface is negotiated individually with CARP.
Yeah, I assume so too :) But what.. It looks so simple.. I can at least ping between the pfsync interface on both fw, from each of them inside shell. So there isn't a sync issue there.
Can it have something with NAT to do or where can I find some logs to help me.
I'm trying to follow the video here: https://www.youtube.com/watch?v=I5n3QXOlxmw&t=643s
Can you ping from WAN to WAN? Do both have a static IP address as the interface address?
pfsync/HA has absolutely no say in CARP.
That's a negative. I can not ping from inside SSH from neither master or backup to the other one (if that is considered WAN-to-WAN). Only can ping it's own WAN interface and others IP's in the /29.
I have a public static transport network (a /29 net) - from my ISP that I use on the wan side.
My ISP provided my with 2 fibers and instructed me to use .89/29 on master and .90/29 on the backup.
And have .88/29 as HA/CARP like I have on my CARP WAN IP.
So each unit has their single WAN-connection directly from my ISP (they provide me with .85 as my GW) and I have configured each units WAN-interface accordingly, with .85 as the GW. I can ping the GW from both of them.
I have plugged in a laptop with IP .86 (taken out my ISP's fiber in each fw and plugged in as my laptop was my ISP) and I can ping both WAN from that to each of the fw WAN IP. So both WAN interfaces are responding correctly there at least, directly connected. But I can't ping from shell from one unit to the other. I can however ping the CARP WAN IP (.88) and their gw (.85) from both WAN (from shell).
To give even more details you don't need, they also provide me with a /24 that I use on my LAN-side (also public static IP-addresses - as all my servers are web-servers that are ment to be public/on public IP - so I basically get kind of transparent fw-ish with my setup).
I can ping my static public LAN-IPs from outside of my WAN, so traffic going through the WAN->LAN perfectly. I can ping LAN-interface .1 .2 and .3 on both fw from shell. In both directions.
Ping from box to box on WAN might not work because by default there is no rule in place that allows that. I forgot that I have floating rules that unconditionally allow ICMP echo on all of my firewalls. No point in blocking "ping".
Your static setup on WAN looks good and if both boxes can ping the ISP gateway - great.
See my screen shot for how configure the CARP VIP. Use a /32 netmask for that - but keep the /29 for the interface addresses on both boxes. Also use Multicast for the CARP sync as shown in my screen shot.
I suspect the automatic firewall rules for CARP do not allow unicast sync by default, but I did not look that deep for now.
HTH,
Patrick
P.S. same configuration (/32 netmask, multicast) on LAN! Just plain works.
Under automatic rules on the WAN-interface, it is listed this:
IPv4 CARP * * 224.0.0.18 * * * * CARP defaults
Then leave your CARP VIP configuration at the defaults. How is it supposed to work if you change the addresses without adding rules?
Also if you synchronise the VIP configuration from master to backup, the backup will end up with the identical peer IP address. Which also can only work as long as that is multicast.
I changed to multicast in beginning of this thread long time ago, so that's what I'm using today :)
I have deleted all my VIPs on both machines. Rebooted both machines. Then configured only the Carp WAN on master with a vhid of 10 instead of 1 - and longer timeout. So it shows master and green. I synced everything over for this WAN interface and now I have CARP VIP on both of them working. My primary says primary and backup says backup.
However, 20 seconds later, both of them shows master. And I get the error message below when I check the log:
2024-11-29T20:49:13 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "WAN CARP (.188) (10@ix1)" has resumed the state "MASTER" for vhid 10
2024-11-29T20:49:13 Notice kernel <6>carp: 10@ix1: BACKUP -> MASTER (master timed out)
So now they are both active again.. so weird. I know the pfsync interface can communicate with each other, as ha-sync is performing as it should. I can also ping both directions.
I tried a floating WAN-rule - allow all ping. But still didn't help on pinging WAN->WAN. I also disabled firewalling (pfctl -d) totally. Still can't ping wan->wan from shell. Can there be a hidden setting somewhere ???
This happens while firewalling are totally off with command above, so it can't be a firewall rule. So it must have something to do with the public ip.
pfsync is not related to CARP. (I'm repeating myself - the state of your HA interface is irrelevant)
How are the WAN interfaces connected to each other and to the ISP router? Any chance there is a switch involved that might not properly support multicast?
I just can't understand how they are not related :) pfsync (I have called the pfsync interface pfsync, with ip 192.168.60.2 and 192.168.60.3 communicates what CARP-IP that should be active?
However: In the data center, I only get two fiber cables from a ISP room directly that I plug directly to each of my 2 opnsense-boxes, one in each So I only know what my ISP has said to me. They have a Cisco HSRP/VRRPP-router (or similar) in HA-setup. So their "CARP-IP" is .85 (that is also my gateway I'm told), but the individual routers then have is .86 and .87.
Until now, I have used pfSense and redundant WAN on same unit. That has worked great. I had to open port 1985 or something like on WAN that so that their two routers can see both lines at all times.
Quote from: firewallfun on November 29, 2024, 09:22:21 PM
I just can't understand how they are not related :) pfsync (I have called the pfsync interface pfsync, with ip 192.168.60.2 and 192.168.60.3 communicates what CARP-IP that should be active?
No. Two nodes speaking CARP exchange which one takes priority via the CARP protocol on the very interface where CARP is active.
pfsync only synchronises the firewall state so in case the master crashes and the backup takes over (via CARP) the connections are not interrupted because the firewall state is missing.
pfsync and CARP are orthogonal technologies. You can have - as I wrote as an example - two proxies (not OPNsense) with CARP and no pfsync at all because there is no pf or other firewall involved.
CARP manages a virtual IP between two nodes and that is all.
Quote from: firewallfun on November 29, 2024, 09:22:21 PM
However: In the data center, I only get two fiber cables from a ISP room directly that I plug directly to each of my 2 opnsense-boxes, one in each So I only know what my ISP has said to me. They have a Cisco HSRP/VRRPP-router (or similar) in HA-setup. So their "CARP-IP" is .85 (that is also my gateway I'm told), but the individual routers then have is .86 and .87.
Until now, I have used pfSense and redundant WAN on same unit. That has worked great. I had to open port 1985 or something like on WAN that so that their two routers can see both lines at all times.
Ask your ISP if on the other side of these two links there is a switch that allows the two OPNsense boxes to communicate with each other or if you are supposed to provide your own.
You need a flat network with
- your ISP default gateway
- both your OPNsense boxes' WAN
so CARP can work.
Again: CARP is a local protocol that manages failover of IP addresses and nothing else. Two nodes in a cluster run CARP on each interface - separately and independently of all other interfaces.
pfsync manages an entirely different part of what makes up a HA firewall cluster.
Ah, ok. I think I got it now. pfsync basically shouldn't have anything to do with this problem at all, it is not the problem here for sure :) Since synchronisation of data works. It is only the CARP/VIP that's the problem.
They have previously said I could use a single unmanaged switch and connect both lines. As long as both their routers can "see each other" through my switch, there is redundant internet and they will choose what line they send data over automatically. I suspect that can be the error as well, it might not be "flat" as it is configured now :)
Here is what they said before:
"These are Layer 3 router ports on our end, so this setup will not create a loop.
Our routers will broadcast HSRP packets to each other via your switch and set up a virtual IP, preferably on Connection 52, with failover to Connection 53 if Connection 52 goes down. Outgoing traffic will then use Connection 52 as long as it is active, while incoming traffic will be distributed across both, depending on where the traffic originates (shortest route)."
Yes, you need a flat network connecting both their uplinks and both your WAN interfaces.
If you want to avoid that single point of failure, you need two switches of a kind that supports "stacking" i.e. acting as if it were a single one. There are various options from different vendors.
HTH,
Patrick
I use that on the LAN-side actually, that's what I have lagg against.
But I would prefer to avoid having additional two switches just for my two lines. Stacking switches cost like 3000 usd per unit for rack-mounted with dual psu. But let's see what my ISP says over the weekend, maybe they have some way to provide me this in a more flat way so I can just have these two fw and save power/rack-space and cabling. I suspect it is just a config change at their end.
Nope, if they have two redundant boxes with CARP or VRRP or HSRP, and you want to do the same, you need a flat intermediate network. Which can be one switch or two.
Alternatively you can of yourse use routing, e.g. BGP. But that's an entirely different setup.
Why not use your two stacking switches you already bought for LAN and create another VLAN over four ports (two on each) to connect your ISP systems and your WAN interfaces? That's the beauty of modern datacentre/enterprise gear: you do not need a physical box for each job.
I know they already do BGP for me. They asked me if I wanted to set it up myself or if they should take care of it. Since I have a /24 I have bought and they route it for me somehow. But maybe in different context than you talk about, not sure. It's greek to me :)
I also found this in my email, when I asked them for some details (anonymized the IP using chat gpt):
"Link network: 203.0.120.184/29
We use 203.0.120.185 (HSRP), 203.0.120.186 (e01), and 203.0.120.187 (e02).
You should use 203.0.120.188 (HSRP/VRRP or equivalent), 203.0.120.189, and 203.0.120.190.
We route 198.51.101.0/24 behind 203.0.120.188 with tracking of interface line protocol, meaning the route will only be active internally and in BGP if the port to you is up. The network will therefore not be visible on the internet until you have connected the links."
They are doing an equivalent of CARP so both your WAN interfaces and both their links must share a single network. Single switch to get it up and running, then consider redundant variants.
Ah, I see. Is it as simple as this if I choose to use the VLAN-method on my stacked LAN-switches? From Chat GPT ;D
Ports for ISP Lines:
Connect one ISP line to port 24 on Switch 1.
Connect the second ISP line to port 24 on Switch 2.
Ports for Firewall WAN Interfaces:
Connect port 25 on Switch 1 to Firewall 1's WAN interface.
Connect port 25 on Switch 2 to Firewall 2's WAN interface.
VLAN Configuration:
Create a dedicated VLAN for your WAN traffic (e.g., VLAN 10).
Assign ports 24 and 25 on both switches to VLAN 10.
Configure these ports as untagged (access mode) for VLAN 10 since ISP lines typically do not tag traffic.
Stacked Switch Behavior:
Ensure the switches are correctly stacked and function as a single logical unit, so VLAN 10 spans both switches seamlessly.
In this case, I can basically run the setup and IPs I already have.
Yes, exactly like this.
Ignore the sync-thing here below, you have explained to me that it doesn't involve the VIP, so it hasn't anything to do with the WAN-network as such. But here is what ISP said:
"CARP works such that you have two IP interfaces that continuously communicate with each other to check if the other side is present, so there must be a physical Layer 2 Ethernet network between the two boxes. I don't know what the sync link is and what it's used for, but if it's a regular Ethernet connection between the boxes and the boxes communicate Ethernet over it, it should work with that cable. "
They recommend two unmanaged switches to create that flat network as you say - since they have routers on their side that senses what line is active etc. But they are not familiar with Opnsense and how it works. Isn't there any way to create a way for both boxes to see which one is active? Like if I create a direct connection on another port and bridge the two fw and the ports.. maybe not possible, just wanted a last try to rescue me from switch-solution :) I think I will just go for two new switches on WAN.
If I was a programmer, I would think it was easy to constantly ping my CARP-IP. If it is is not active (no ping reply), then make this backup-fw primary and make the CARP-IP active. Until I hear from the master via the sync-interface or a seperate line, then deactivate it. Why is it so hard to do :) Why not communicate signals like this on a seperate cable or just use the sync-interface as it is already in constant internal traffic. I don't get it. I wouldn't mind if it took a minute instead of seconds even, in case it needed to be sure it is really down.
CARP works on the link where the virtual address is present. Period. The protocol is designed that way. So are the two alternative protocols that are more common with closed source vendors: VRRP and HSRP.
Google the protocol definitions if you do not believe me.
Now what is the reasoning? Simple, why should there be a dedicated link simply to keep a virtual IP address active on one of two or more nodes?
Commonly the nodes are not firewalls or routers or switches but simply servers - like an HA storage cluster or as I mentioned as an example a load balancer or a Varnish cache.
You need a flat switched network on WAN if you want HA. There is no way around that. You need a flat switched network on all interfaces where you want to run CARP.
Quote from: Patrick M. Hausen on November 30, 2024, 08:35:38 PM
Now what is the reasoning? Simple, why should there be a dedicated link simply to keep a virtual IP address active on one of two or more nodes?
Well, to save the planet and save power, have less network gear :) I bet there are more than me that have redundant lines from their ISP and having IP-ranges assigned to the VIP. But I guess that is mostly in the enterprise world and they don't care about these things, they just buy whatever needed and happy with that.
I do understand that in a standard, you can't go out and do other things necessarily. I'm saying this in case there are other features in OPNSense that could provide this (that I don't know of) :) On a logical level, if we look away from the limitations given by the standard, I do not understand that it isn't possible to do Active/Backup fw feature. For instance, I have a VPN firewall that pings a given GW IP. If that GW stops responding, it will activate a different WAN-network. Shouldn't be hard to program a script to do something similar, that for instance deactivates WAN totally if the returned (if any) mac-address or host-name responds to a arp, ping or other type of request in a certain expected or unexpected way. The script could continue to ping, from LAN (or other internal function), to the WAN CARP IP or the upstream GW. And activate WAN again if no ping/response.
I got reply on the other thing you asked me earlier also, basically confirming what you have said:
"Ask your ISP if on the other side of these two links there is a switch that allows the
two OPNsense boxes to communicate with each other or if you are supposed to provide your
own."
Their answer:
"This is not a viable option. While it can technically be done, it would mean that we can no longer guarantee availability. For example, one of the lines between us and you could go down without our HSRP setup detecting it, resulting in us transporting your traffic between our routers because you lack Layer 2 on your side. In the current setup, we configure the HSRP endpoints on our side to automatically withdraw internal routes in our network if the port goes down."
You do have one pair of stackable switches already, right? What's keeping you from using them with the recipe you got from Chat GPT above? It will work. No 4 ports to spare?
If I was in your situation I would do this:
- take my stackable switches and create two multi-chassis LACP trunks
- connect both OPNsense with LACP to both switches across those trunks
- run ALL interfaces on OPNsense as VLANs across those MLAG/LACP trunks including WAN
- create one access port on each switch with the "WAN" VLAN untagged and plug your ISP lines into these
Done. My own data centre OPNsense HA pair works exactly like that.
┌───────────────────────────────────────────┐
│ │
╔══│ Switch 1 │───────▶
║ │ │
║ └─────────┬───────────────────────┬─────────┘ to ISP
║ │ │
║ │ │
║ │ │
║ │ │
║ ┌───────────────────┐ ┌───────────────────┐
║ │ │ │ │
whatever they ║ │ OPNsense 1 │◀─▶│ OPNsense 2 │
use for stacking ║ │ │ │ │
║ └───────────────────┘ └───────────────────┘
║ │ pfsync │
║ │ │
║ │ │
║ │ │
║ ┌─────────┴───────────────────────┴─────────┐
║ │ │
╚══│ Switch 2 │───────▶
│ │
└───────────────────────────────────────────┘ to ISP
HTH,
Patrick
This is partly how I have it on the LAN-side of OPNsense. I have LACP-lag from both OPNsense's to the stacked switches, in total 4 spf+ cables between the switch-pair and the OPNsense lagg-pair. If I take out one switch or one OPNSense, it will not affect the network on the LAN-side. My switch doesn't support MLAG I think (Multi chassis) but I guess it doesn't matter in this case as it uses stacking, only 2 units stackable and LACP only.
https://www.fs.com/de-en/products/108710.html
I assume you would then re-use the current cables in your suggestion, so both LAN and WAN (and everything else) goes over same cables, only with VLANS separating it? That sounds effective.
If it's "stackable" and if they support an LACP bundle over two ports, one from each switch, it doesn't matter if they call it MLAG or whatever. From a redundancy point of view that is all the same. If LACP to two switches is supported, you are fine.
And yes, in my setup each OPNsense has got one cable to each of the switches - LAGG - and everything else is VLANs.
Only thing that is not: the pfsync/HA interface - dedicated interface on both firewalls and then just a cable between them.
If you want to be extra safe from cable or single interface failure, you can of course make that pfsync/HA link a LAGG of two ports, too.
At the end it all boils down to:
Can I take a sledge hammer and destroy any single box without the customers noticing? That's the design goal.
I had a fiber switch/edgeswitch laying around here and all worked from same second I put that on WAN-side, so you were right. Now the HA-works perfectly.
Also have another one I don't use, so I can split it up later, but now I can have fun! VLANs sounds a bit complicated, so was easier to get up and running this way.
Thank you for all help and suggestion with this, it is appreciated! :)
Reading from all the replies here I found out that the issue seems to have been the ICMP ping being blocked on the WAN interface.
Once I was able to ping each of the nodes through the WAN interface it seems that CARP IP on WAN became stable.
Thanks for the help! Hope this helps someone else having the same issue.
Nevermind... Spoke too soon... second node still gets MASTER on the WAN.
I have no idea what could be going wrong really.
Do you have a flat network, i.e. a switch connecting your two WAN interfaces and do you use multicast for CARP? ICMP echo and CARP are completely unrelated.
WAN interfaces have direct connection to the ISP router (which has 4 ports and I'm currently using 2 - one per each OPNSense).
I am using multicast, yes.
I still get the BACKUP -> MASTER (master timed out) after a while on that WAN CARP IP...