CARP: backup node responds instead of alive master node on adjacent subnet

Started by vnxme, May 27, 2021, 10:50:52 AM

Previous topic - Next topic
Hi everyone,

I'm not sure if this is a bug of OPNsense/FreeBSD, my own mistake or the way it should be, that's why I decided to start a topic here instead of opening a ticket on GitHub.

Now I have a configuration which could be effectively simplified to as follows for discussion purposes:
.

To test it I use traceroute and/or https with the following results (both boxes are alive):

  • 192.168.17.254 -> 192.168.17.1: OK, Box 1 responds
  • 192.168.17.254 -> 192.168.17.2: OK, Box 2 responds
  • 192.168.17.254 -> 192.168.17.11: OK, Box 2 responds, as it's a master of VHID 11 on VLAN 17
  • 192.168.17.254 -> 192.168.16.1: OK, Box 1 responds via 192.168.17.1
  • 192.168.17.254 -> 192.168.16.2: OK, Box 2 responds via 192.168.17.1
  • 192.168.17.254 -> 192.168.16.11: WRONG, Box 1 responds, but it's a backup of VHID 11 on VLAN 16

If I remove a CARP address from vtnet0_vlan16 of Box 1 (or just change its IP and VHID to anything other than 11), the last test scenario gets OK, Box 2 responds via 192.168.17.1.

Thus, is there anything I could have missed in the configuration which makes Box 1 (the backup node) respond to a packet destined for a CARP address instead of Box 2 (the master node) when the Client is connected to this subnet via Box 1 (the backup node)?

Environment: OPNsense 21.1.5-amd64.

Thanks in advance.

Regards,
Vladimir

It's even more strange with the following test scenarios:

  • 192.168.16.1 -> 192.168.16.11: WRONG, Box 1 responds (packet doesn't even leave this machine)
  • 192.168.17.1 -> 192.168.17.11: WRONG, same as above

My understanding is that the backup node should not respond on its CARP address unless it becomes a master. Unfortunately, these examples do not support it. Could anyone confirm whether such CARP behavior is intended or not?

That does seem unexpected. How did you verify which box is responding?

Quote from: clarknova on June 17, 2021, 11:43:18 PM
That does seem unexpected. How did you verify which box is responding?

I used an https backend (nginx) serving a static page with a box number.