CARP WAN VIP not reachable

Started by liceo, December 08, 2023, 04:57:21 PM

Previous topic - Next topic
Hi all

I setup again a new HA cluster running on two Hyper-V boxes. I did the HA setup same as my other installations but this time i cannot reach the CARP VIP from the WAN side. It's a pretty standard setup at follows:


  • Two ONPSense with LAN and WAN Interfaces
  • MAC spoofing is enabled
  • Added a CARP VIP on both interfaces
  • Setup sync between HA pairs
  • Failover is working tested from the LAN
  • Ping to all Interfaces including the VIPs possible from LAN

What does NOT work now:



  • I Can reach the real WAN IPs from the WAN transfer network but NOT the VIP
  • I cannot use the WAN VIP in the outbound NAT rule > Internet is not reachable anymore

I did recreate all the VIPs, recreate the outbound NAT rule, rebooted several times, checked the Firewall logs,  checked the TCPDump (not one package to the WAN VIP..).

Any ideas??

Many thanks!

I have exactly the same problem. I have to give the physical interfaces the required IP address, then the OPNsense works. Of course I no longer have a backup for that.

I can't see any traffic on the VIP's anywhere. How can you narrow down this error?

December 09, 2023, 08:27:37 PM #2 Last Edit: December 09, 2023, 09:35:17 PM by liceo
[mention]danbet [/mention] Do you also run OPNsense on Hyper-V?

I was able to solve it! I had to recreate the virtual switch on Hyper-V servers without SR-IOV enabled.

Quote from: liceo on December 10, 2023, 09:28:30 AM
I was able to solve it! I had to recreate the virtual switch on Hyper-V servers without SR-IOV enabled.

No, with VMware ESXi.

Ah, ok. But may you also try disable SR-IOV..

I have no such attitude. I can only choose SR-IOV passthrough as the network interface, but I chose E1000.

I'm seeing something similar on the 'inside' VIP but only for a Sonoff door sensor. If I configure the Sonoff unit to use the OPNSense physical IP of one of the units the Sonoff sensor starts working. I'm running OPNSense on Proxmox. What's really weird is only the Sonoff units are affected. I'll keep digging.

I find the solution for VMware ESXi: I had to enable the promiscuous mode for all the interfaces. For this I created port groups to use only for the VM's with OPNsense.

November 09, 2025, 07:05:56 PM #9 Last Edit: November 09, 2025, 07:30:53 PM by chadtn
Quote from: liceo on December 10, 2023, 09:28:30 AMI was able to solve it! I had to recreate the virtual switch on Hyper-V servers without SR-IOV enabled.

I just spent three days trying to figure out why my WAN VIP was working on one HyperV host, but not the other.  Turns out I had SR-IOV enabled on one host's vSwitch and not on the other one.  As soon as I deleted and re-created with SR-IOV turned off, everything started working.  I'm running HyperV on Server 2022 if anyone runs into the same thing.

Thanks for sharing this!!

Chad

I have a situation where our hosting provider filters out the CARP protocol including the IPs with VHID.
But with a normal IP alias it works.
Had to write my own script that tracks the CARP IP on the LAN side and adds or removes IP aliases on the WAN interfaces, as soon as the CARP status changes on LAN.
So far it works reliably, but it's not 100% optimal...

Hi,
We recently ran into exactly the same behavior described in this thread, and after a fair amount of digging I wanted to share what we found and how we worked around it.


Symptoms (same as described above)

CARP works reliably on LAN / internal networks
CARP on the WAN interface behaves inconsistently
ARP resolution looks correct
Sometimes the first ICMP packet works
Subsequent traffic is dropped or blackholed

On the switches, we observed:
ARP table is correct (VIP → CARP virtual MAC)
MAC address table never learns the CARP virtual MAC
As a result, unicast traffic to the VIP is not forwarded reliably

Why this happens (key point)

This is not really a CARP bug, but an interaction between floating L2 identities and virtualized switching.

In virtual environments (ESXi + distributed switches in our case):
CARP replies ARP with the correct virtual MAC (00:00:5e:00:01:XX)
However, frames sourced with that MAC are not always learned by physical ToR switches
Even with Forged Transmits, MAC Address Changes, and Promiscuous Mode enabled

On LAN networks, this often works because:
Traffic stays inside the hypervisor or distributed switch
The physical switch is never involved
The CARP MAC does not need to be learned upstream

On the WAN, traffic must traverse physical uplinks:
The ToR switch must learn the source MAC
The CARP virtual MAC is never learned
Result: ARP resolves, first packet may pass, steady-state traffic fails

This explains why the issue appears WAN-only and why it is so inconsistent.


Workaround / design pattern that worked reliably for us

We solved this by separating HA control-plane from data-plane identity:
Keep CARP for state and master election only
Do not use the CARP VIP for production traffic
Create a plain IP Alias (no VHID) for the production IP
Move that alias between nodes based on CARP MASTER state


This way:
The production IP always uses the physical interface MAC
The switch can learn the MAC normally
CARP still provides HA logic and state sync
WAN traffic becomes stable and predictable


We implemented this using Monit and a small script that:
Adds the alias on the CARP MASTER
Removes it on the BACKUP node


Example logic (simplified):
if ifconfig | grep -q "carp: MASTER vhid 1"; then
    ifconfig vmx1 inet <PROD_IP>/24 alias
else
    ifconfig vmx1 inet <PROD_IP>/24 -alias
fi



Monit runs this every few seconds, so failover and failback are fast.


Conclusion

This seems to be a general limitation when using floating virtual MACs on WAN interfaces in virtualized environments, especially when traffic must traverse physical switching.
The workaround above has been stable for us and avoids relying on a MAC address that the physical fabric never learns.

Posting this in case it helps others who run into the same issue — happy to clarify or compare notes.