1
High availability / CARP status not synced in HA Openstack setup
« on: October 15, 2024, 12:29:08 pm »
Hi everyone,
I'm currently running a dual OPNsense CARP setup on Openstack, following this guide: https://www.thomas-krenn.com/en/wiki/OPNsense_HA_Cluster_configuration.
I'm facing the issue, that the CARP status is not correctly synced between each other and I'm hoping for some sound advice or general direction for troubleshooting.
Setup layout:
The setup is very simple (this is a test environment in my lab)
- DMZ network in with both OPNsense attached
--FW1: 10.20.0.20
--FW2: 10.20.0.21
LAN network with CARP VIP
- FW1: 10.30.0.20 / CARP VIP: 10.30.0.253
- FW2: 10.30.0.21 / CARP VIP: 10.30.0.253
WAN network:
- FW1: 10.10.0.20 / CARP VIP: 10.10.0.253
- FW2: 10.10.0.21 / CARP VIP: 10.10.0.253
Clients
VM1 - 10.30.0.11
VM2 - 10.30.0.12
VM3 - 10.30.0.13
- The sync is happening over the DMZ network. When I manually sync, firewall rules and the likes are successfully synced to FW2.
- Opnestack Port Security is completely disabled on every port for all compute instances
- I set the "peer IP" in each firewall (10.20.0.20 / 10.20.0.21).
The Problem
When I reboot machine 1, it comes up as CARP master. When I then reboot machine 2, it comes up as master as well, which it shouldn't. It should have noticed, that fw1 is already master. It then produces duplicates when I ping 10.30.0.253 from a client.
What has been done
- I can ping FW2 from FW1 on the sync interface and vice versa
- I set a tunable "net.inet.carp.senderr_demotion_factor=0" on both machines
- Set allow all rules on the interfaces to test if something gets blocked (there is nothing blocked in the live logs)
- Disabled Port Security on all compute instances to allow MAC spoofing
- Went through every forum post and issue report that could potentially be related to this, like:
- https://forum.netgate.com/topic/80092/resolved-carp-not-failing-back-and-other-weird-behaviour-on-pfsense-2-2/9
- https://forum.opnsense.org/index.php?topic=36421.msg177768#msg177768
- https://forum.opnsense.org/index.php?topic=5412.msg22460#msg22460
I'm suspecting that this just might not work with Openstack due to how Openstack handles multicast, but I wanted to be sure and get a hint from somebody with experience, if possible.
Thanks a lot in advance
tcpdump output on fw1:
(10.20.0.20 is my client from which I SSH into the FW1)
I'm currently running a dual OPNsense CARP setup on Openstack, following this guide: https://www.thomas-krenn.com/en/wiki/OPNsense_HA_Cluster_configuration.
I'm facing the issue, that the CARP status is not correctly synced between each other and I'm hoping for some sound advice or general direction for troubleshooting.
Setup layout:
The setup is very simple (this is a test environment in my lab)
- DMZ network in with both OPNsense attached
--FW1: 10.20.0.20
--FW2: 10.20.0.21
LAN network with CARP VIP
- FW1: 10.30.0.20 / CARP VIP: 10.30.0.253
- FW2: 10.30.0.21 / CARP VIP: 10.30.0.253
WAN network:
- FW1: 10.10.0.20 / CARP VIP: 10.10.0.253
- FW2: 10.10.0.21 / CARP VIP: 10.10.0.253
Clients
VM1 - 10.30.0.11
VM2 - 10.30.0.12
VM3 - 10.30.0.13
- The sync is happening over the DMZ network. When I manually sync, firewall rules and the likes are successfully synced to FW2.
- Opnestack Port Security is completely disabled on every port for all compute instances
- I set the "peer IP" in each firewall (10.20.0.20 / 10.20.0.21).
The Problem
When I reboot machine 1, it comes up as CARP master. When I then reboot machine 2, it comes up as master as well, which it shouldn't. It should have noticed, that fw1 is already master. It then produces duplicates when I ping 10.30.0.253 from a client.
What has been done
- I can ping FW2 from FW1 on the sync interface and vice versa
- I set a tunable "net.inet.carp.senderr_demotion_factor=0" on both machines
- Set allow all rules on the interfaces to test if something gets blocked (there is nothing blocked in the live logs)
- Disabled Port Security on all compute instances to allow MAC spoofing
- Went through every forum post and issue report that could potentially be related to this, like:
- https://forum.netgate.com/topic/80092/resolved-carp-not-failing-back-and-other-weird-behaviour-on-pfsense-2-2/9
- https://forum.opnsense.org/index.php?topic=36421.msg177768#msg177768
- https://forum.opnsense.org/index.php?topic=5412.msg22460#msg22460
I'm suspecting that this just might not work with Openstack due to how Openstack handles multicast, but I wanted to be sure and get a hint from somebody with experience, if possible.
Thanks a lot in advance
tcpdump output on fw1:
(10.20.0.20 is my client from which I SSH into the FW1)
Code: [Select]
07:46:58.248014 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 6980:7328, ack 1, win 514, options [nop,nop,TS val 4138402537 ecr 3632030865], length 348
07:46:58.248383 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 7328, win 445, options [nop,nop,TS val 3632030965 ecr 4138402537], length 0
07:46:58.347958 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 7328:7804, ack 1, win 514, options [nop,nop,TS val 4138402634 ecr 3632030965], length 476
07:46:58.348385 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 7804, win 445, options [nop,nop,TS val 3632031065 ecr 4138402634], length 0
07:46:58.412834 IP 10.20.0.20 > 10.20.0.21: PFSYNCv5 len 393
insert count 1
update compressed count 1
eof count 1
07:46:58.448070 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 7804:8280, ack 1, win 514, options [nop,nop,TS val 4138402737 ecr 3632031065], length 476
07:46:58.448488 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 8280, win 445, options [nop,nop,TS val 3632031165 ecr 4138402737], length 0
07:46:58.548048 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 8280:8628, ack 1, win 514, options [nop,nop,TS val 4138402837 ecr 3632031165], length 348
07:46:58.548529 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 8628, win 445, options [nop,nop,TS val 3632031265 ecr 4138402837], length 0
07:46:58.648166 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 8628:8976, ack 1, win 514, options [nop,nop,TS val 4138402937 ecr 3632031265], length 348
07:46:58.648739 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 8976, win 445, options [nop,nop,TS val 3632031365 ecr 4138402937], length 0
07:46:58.748067 IP 10.20.0.20 > 10.20.0.21: PFSYNCv5 len 393
insert count 1
update compressed count 1
eof count 1
07:46:58.748072 IP OPNsense > 10.20.0.21: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
07:46:58.748117 IP 10.10.0.20 > 10.20.0.21: VRRPv2, Advertisement, vrid 3, prio 0, authtype none, intvl 1s, length 36
07:46:58.748293 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 8976:9324, ack 1, win 514, options [nop,nop,TS val 4138403037 ecr 3632031365], length 348
07:46:58.748702 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 9324, win 445, options [nop,nop,TS val 3632031465 ecr 4138403037], length 0
07:46:58.848168 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 9324:9888, ack 1, win 514, options [nop,nop,TS val 4138403137 ecr 3632031465], length 564
07:46:58.848227 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 9888:10068, ack 1, win 514, options [nop,nop,TS val 4138403137 ecr 3632031465], length 180
07:46:58.848661 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 10068, win 445, options [nop,nop,TS val 3632031565 ecr 4138403137], length 0
07:46:58.948200 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 10068:10576, ack 1, win 514, options [nop,nop,TS val 4138403237 ecr 3632031565], length 508
07:46:58.948753 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 10576, win 445, options [nop,nop,TS val 3632031665 ecr 4138403237], length 0
07:46:59.048236 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 10576:10924, ack 1, win 514, options [nop,nop,TS val 4138403337 ecr 3632031665], length 348
07:46:59.048780 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 10924, win 445, options [nop,nop,TS val 3632031765 ecr 4138403337], length 0
07:46:59.148171 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 10924:11272, ack 1, win 514, options [nop,nop,TS val 4138403437 ecr 3632031765], length 348
07:46:59.148777 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 11272, win 445, options [nop,nop,TS val 3632031865 ecr 4138403437], length 0
07:46:59.248191 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 11272:11620, ack 1, win 514, options [nop,nop,TS val 4138403537 ecr 3632031865], length 348
07:46:59.248751 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 11620, win 445, options [nop,nop,TS val 3632031965 ecr 4138403537], length 0
07:46:59.348217 IP 10.20.0.20.ssh > 10.20.0.10.40524: Flags [P.], seq 11620:11968, ack 1, win 514, options [nop,nop,TS val 4138403637 ecr 3632031965], length 348
07:46:59.348759 IP 10.20.0.10.40524 > 10.20.0.20.ssh: Flags [.], ack 11968, win 445, options [nop,nop,TS val 3632032065 ecr 4138403637], length 0
07:46:59.448061 IP 10.20.0.20 > 10.20.0.21: PFSYNCv5 len 112
update compressed count 1
eof count 1