OPNsense Forum

Archive => 18.7 Legacy Series => Topic started by: bb-mitch on October 30, 2018, 11:21:06 pm

Title: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: bb-mitch on October 30, 2018, 11:21:06 pm
We have a pair of opnsense configured with high availability / carp and recently noticed an odd behavior.

One of our Virtual IP's was intermittently not responding to pings. About 10 seconds "on" / 20 seconds "off" - pretty regular. The issue only applied to a single IP. We could not see any CARP traffic on the public network, but we found a way to "fix" the problem - by changing the VHID to a different number, the problem went away.

The virtual IP in question does have a password (long and complex random string).

The base / skew is 1 / 0.

I would have expected any competing broadcasts for this VHID would have not been accepted by our router due to the mismatched password.

And yet somethign seems to be "stealing" our address - I don't see the carp mode changing to backup on the primary but perhaps I'm having trouble catching it?

I did run packet captures on the WAN - and although I couldn't see traffic to indicate that's what was happening, I think the symptoms would indicate that's the cause?

By changing the VHID of that one Virtual IP, I can work around the issue. If I change teh VHID back the issue returns. I'd like to resolve the issue permanently - I'm on the latest release firmware.

Can anyone recommend any next steps?

Thanks in advance :-)
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: mimugmail on October 31, 2018, 06:22:11 am
We need details .. number of VHID for all VIP, also IPs of all VIPs, do you also use aliases for VIPs? preempt enabled somewhere? Do you see packets via tcpdump from both nodes? what happens when you test password to "test"?
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: bb-mitch on November 01, 2018, 04:26:03 pm
Hi - we currently have VHID 1 on the LAN, and VHID 2 through 7 on the WAN.

I've been doing some thinking and reading.

They are type CARP. Each IP is a single address inside a /27 network.

The setup has been working fine for about 2 years.

We see some VERY MINOR ping response on this IP using VHID 2 normally (the native WAN IP shows none) - pinging it once per second, we see a regular 1 packet lost every 10 minutes. But since the issue appeared, we are seeing close to 40% packet loss.

We found the broadcast domains at the colo do not seem to be properly separated (we can see other traffic in packet captures on the WAN we should not see) but we do not see any other VRRP or CARP traffic which would directly explain the issue.

If I understand CARP properly, the CARP IP is associated with a kind of virtual MAC - so perhaps this behavior is simply someone asserting that same duplicated MAC through their own configuration of CARP or VRRP - which causes the router or switch to relearn that MAC periodically resulting in the loss of ping responses?

Although that doesn't explain the 1 packet every 10 minutes, I think it does explain the bursts of loss when I use VHID 2. In the images attached, I change from VHID 8 to VHID 2, capture the loss, and then change back to 8.

If that's what I'm seeing, then all I need to do is confirm what the associated MAC address would be for VHID 2 and presumign that's universal, I would need to push that issue back upstream at the colo.

Does that make sense?

Thanks!

Mitch
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: mimugmail on November 01, 2018, 07:17:02 pm
You have 6 Vhid on one interface? Does this make sense?
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: bb-mitch on November 01, 2018, 07:58:20 pm
Totally willing to accept other opinion and advice on network architecture, but the number of VHID isn't the issue (or doesn't seem to be). If I change the VHID the problem is resolved.

My issue does seem to be an apparent duplicate of the pseudo mac assocaited with VHID 2 on a network which should be partitioned to prevent such things (but isn't yet).

For what it's worth / continuing my education, I'm interested to know your advice though...

If you had a /27 on the WAN, and wanted to NAT say 6 to 12 of those addresses through an OPNsense setup with CARP what would you do?

If you wanted to NAT them all, what would you do then?

Thanks :-)
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: mimugmail on November 01, 2018, 09:01:53 pm
Create an alias for the major Vhid :) as this wont ne replicated you have to add the alias on the Backup as well
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: bb-mitch on November 02, 2018, 05:13:21 pm
There are some differences though right?

Like an alias won't respond to icmp (i.e. ping, traceroute, etc.).

We wanted the ability to selectively allow icmp monitoring of the various IP's (filtered by firewall) instead of just monitoring that the firewall itself was "up".  And I think with aliases we lose that functionality. And because of that loss, we wondered if that might affect how the upstream router viewed the IP (in terms of speed to detect / return an ICMP unreachable to the remote end), etc.

I wasn't sure what else we might be losing in terms of flexibility if we wanted to move an IP from one router set to another for example - and without it's own virtual mac, that process might be more complicated?

Is there a reason using multiple VHID is "bad" - we've been doing it for years without issue? I wasn't under the impression is used any significant resources althgouh there are only a limited number of VHID's within a broadcast domain of course.

Thanks again :-)
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: mimugmail on November 02, 2018, 08:06:51 pm
Carp monitors a link, nic, or system, so one Vhid per nic is enough. Icmp responds come from alias, traceroute from Main IP
Title: Re: CARP issue (?) or my error - seems to exist since before 18.1 to present
Post by: bb-mitch on November 02, 2018, 11:45:02 pm
ok well thanks - will keep that in mind for the future but as long as there's nothing wrong with what we have I won't rush to change. Have a good weekend - thanks,
Mitch