Hello there,
I still have a trouble with LAGG, VLAN and CARP for high-availability.
I setuped a simple lab like this in the vlan 1001 and there is a failover lagg between my firewalls and a switch :
192.168.111.1 192.168.111.2
------------- ------------
- FW1 --------------- FW 2 -
------------- -------------
| VIP : 192.168.111.3
|
|
|
---------------
- SW1 -
---------------
192.168.111.4
The only test i've done is to ping from FW2 to the switch and it only didn't work when the source IP is the VIP. In this case the FW2 is the master and I tested with deactivated firewall to be sure this not a trouble related to some filtering.
I can see the multicast announcement without any troubles on both side (FW1 and FW2) :
root@FW2:~ # tcpdump -npi lagg0_vlan1001 -T CARP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lagg0_vlan1001, link-type EN10MB (Ethernet), capture size 65535 bytes
capability mode sandbox enabled
10:16:28.417090 IP 192.168.111.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=14985732001005176251
10:16:29.839085 IP 192.168.111.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=10997560983974190372
10:16:31.238086 IP 192.168.111.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=16271606076571394589
10:16:32.645086 IP 192.168.111.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=14406240274046045491
10:16:34.046085 IP 192.168.111.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=17719711089369943554
If I ping from the .2 to .4 it's working well :
root@FW2:~ # ping -S 192.168.111.2 192.168.111.4
PING 192.168.111.4 (192.168.111.4) from 192.168.111.2: 56 data bytes
64 bytes from 192.168.111.4: icmp_seq=0 ttl=64 time=5.870 ms
64 bytes from 192.168.111.4: icmp_seq=1 ttl=64 time=1.932 ms
The packet are okay (tcpdump -ni lagg0 -s0 -w from-interface-ip.pcap) :
(http://i.imgur.com/6c5IS3t.png) (http://i.imgur.com/6c5IS3t.png)
If I do the same from the .3, it's not working anymore :
root@FW2:~ # ping -S 192.168.111.3 192.168.111.4
PING 192.168.111.4 (192.168.111.4) from 192.168.111.3: 56 data bytes
^C
--- 192.168.111.4 ping statistics ---
4 packets transmitted, 0 packets received, 100.0% packet loss
The packet seems to be okay (tcpdump -ni lagg0 -s0 -w from-vip.pcap) :
(http://i.imgur.com/AAbeupb.png) (http://i.imgur.com/AAbeupb.png)
If I look deeper in the packet and add the filter vlan.id == 1001, you can see that in the first case all packet are still there and in the second case, all packet are not containing the vlan id.
Form the Interface IP
(http://i.imgur.com/WnVcBVC.png) (http://i.imgur.com/WnVcBVC.png)
From the VIP :
(http://i.imgur.com/fAcj33N.png) (http://i.imgur.com/fAcj33N.png)
It seems that the vlan id is not added to the packet when the source IP is the VIP.
It seems that the setup is correct on my side. Moreover if I shutdown the FW2, the FW1 become master directly. What can I do ?
Whoopsie, let me take a closer look why this happens...
In fact, may I see an ifconfig from this configured system?
Here we go :
root@FW1:~ # ifconfig
oce0: flags=8143<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1500
options=502bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO6,VLAN_HWFILTER,VLAN_HWTSO>
ether 00:90:fa:9d:29:00
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
oce1: flags=8143<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1500
options=502bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO6,VLAN_HWFILTER,VLAN_HWTSO>
ether 00:90:fa:9d:29:00
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 0c:c4:7a:32:5b:c8
inet 1.1.1.213 netmask 0xfffffff0 broadcast 1.1.1.223
inet6 fe80::ec4:7aff:fe32:5bc8%igb0 prefixlen 64 scopeid 0x3
inet 1.1.1.212 netmask 0xfffffff0 broadcast 1.1.1.223
inet 1.1.1.215 netmask 0xfffffff0 broadcast 1.1.1.223
inet 1.1.1.216 netmask 0xfffffff0 broadcast 1.1.1.223
inet 1.1.1.217 netmask 0xfffffff0 broadcast 1.1.1.223
inet 1.1.1.218 netmask 0xfffffff0 broadcast 1.1.1.223
inet 1.1.1.219 netmask 0xffffffff broadcast 1.1.1.219
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 0c:c4:7a:32:5b:c9
inet 192.168.12.1 netmask 0xfffffff8 broadcast 192.168.12.7
inet6 fe80::ec4:7aff:fe32:5bc9%igb1 prefixlen 64 scopeid 0x4
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=503bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWFILTER,VLAN_HWTSO>
ether 0c:c4:7a:32:5b:ca
inet6 fe80::ec4:7aff:fe32:5bca%igb2 prefixlen 64 scopeid 0x5
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO>
ether 0c:c4:7a:32:5b:cb
inet6 fe80::ec4:7aff:fe32:5bcb%igb3 prefixlen 64 scopeid 0x6
inet 192.168.13.101 netmask 0xffffff80 broadcast 192.168.13.127
inet 192.168.13.100 netmask 0xffffff80 broadcast 192.168.13.127
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
pflog0: flags=100<PROMISC> metric 0 mtu 33160
pfsync0: flags=41<UP,RUNNING> metric 0 mtu 1500
pfsync: syncdev: igb1 syncpeer: 192.168.12.2 maxupd: 128 defer: off
enc0: flags=0<> metric 0 mtu 1536
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=502bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO6,VLAN_HWFILTER,VLAN_HWTSO>
ether 00:90:fa:9d:29:00
inet6 fe80::290:faff:fe9d:2900%lagg0 prefixlen 64 scopeid 0xb
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
laggproto failover lagghash l2,l3,l4
laggport: oce1 flags=0<>
laggport: oce0 flags=5<MASTER,ACTIVE>
lagg0_vlan1001: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=203<RXCSUM,TXCSUM,TSO6>
ether 00:90:fa:9d:29:00
inet6 fe80::290:faff:fe9d:2900%lagg0_vlan1001 prefixlen 64 scopeid 0x16
inet 192.168.111.1 netmask 0xffffff00 broadcast 192.168.111.255
inet 192.168.111.3 netmask 0xffffff00 broadcast 192.168.111.255 vhid 1
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 1001 parent interface: lagg0
carp: MASTER vhid 1 advbase 1 advskew 0
ovpns1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
inet6 fe80::ec4:7aff:fe32:5bc8%ovpns1 prefixlen 64 scopeid 0x17
inet 2.2.2.1 --> 2.2.2.2 netmask 0xffffffff
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Opened by PID 26324
ovpns2: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
inet6 fe80::ec4:7aff:fe32:5bc8%ovpns2 prefixlen 64 scopeid 0x18
inet 2.2.2.1 --> 2.2.2.2 netmask 0xffffffff
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Opened by PID 30019
root@FW2:~ # ifconfig
oce0: flags=8143<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1500
options=400a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:90:fa:9d:29:d8
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
oce1: flags=8143<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1500
options=400a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:90:fa:9d:29:d8
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 0c:c4:7a:32:63:f4
inet 1.1.1.214 netmask 0xfffffff0 broadcast 1.1.1.223
inet6 fe80::ec4:7aff:fe32:63f4%igb0 prefixlen 64 scopeid 0x3
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 0c:c4:7a:32:63:f5
inet 192.168.12.2 netmask 0xfffffff8 broadcast 192.168.12.7
inet6 fe80::ec4:7aff:fe32:63f5%igb1 prefixlen 64 scopeid 0x4
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether 0c:c4:7a:32:63:f6
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM>
ether 0c:c4:7a:32:63:f7
inet6 fe80::ec4:7aff:fe32:63f7%igb3 prefixlen 64 scopeid 0x6
inet 192.168.13.102 netmask 0xffffff80 broadcast 192.168.248.127
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
pflog0: flags=100<PROMISC> metric 0 mtu 33160
pfsync0: flags=41<UP,RUNNING> metric 0 mtu 1500
pfsync: syncdev: igb1 syncpeer: 192.168.12.1 maxupd: 128 defer: off
enc0: flags=0<> metric 0 mtu 1536
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:90:fa:9d:29:d8
inet6 fe80::290:faff:fe9d:29d8%lagg0 prefixlen 64 scopeid 0xb
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
laggproto failover lagghash l2,l3,l4
laggport: oce1 flags=0<>
laggport: oce0 flags=5<MASTER,ACTIVE>
lagg0_vlan1001: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 00:90:fa:9d:29:d8
inet6 fe80::290:faff:fe9d:29d8%lagg0_vlan1001 prefixlen 64 scopeid 0x15
inet 192.168.111.2 netmask 0xffffff00 broadcast 192.168.111.255
inet 192.168.111.3 netmask 0xffffff00 broadcast 192.168.111.255 vhid 1
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 1001 parent interface: lagg0
carp: BACKUP vhid 1 advbase 1 advskew 100
ovpns1: flags=8010<POINTOPOINT,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
ovpns2: flags=8010<POINTOPOINT,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
The strange thing is that I have VIP on others lagg interfaces and it's working well.
Yeah, it looks correct. It may be a FreeBSD limitation, but I really don't know carp driver well enough to say this works or not. At least it looks kernel related to me...
Why do you need to ping from alias IP? Is it a test or a real world problem (all logical assumptions aside)?
It's just to test the setup because if I add a machine I can't reach the gateway correctly. So I tried with a ping and saw that the vlan ID is not tagged in.
If it doesn't affect operability the likelihood of an actual driver issue is greater. I've forwarded this to a FreeBSD expert. :)
Thank you. I can have a look with emulex too.
Do you have any clue on what's going on (if I can be more precise) ?
Have a nice day and thank you again
It really seems to be a driver issue so I'll try to see if anyone in FreeBSD is interested in looking into this.
I'm marking this [WONTFIX]. After skimming through a CARP-related manual I saw that an Alias IP is supposed to be reachable from the outside, but the machine must use its real IP address when talking to the outside. I think that is what we're seeing here: a broken code path that is being forced to be used but won't apply to the systems behaviour.
Thank for your time and help me on this matter.
I don't understand what you mean by "the machine must use its real IP address when talking to the outside"
Can you be more detailed ?
Thank you again
The alias is only for being reachable from the outside. The rule is traffic connecting to the machine can use either IP address and the machine can talk back on either IP, but it cannot start a connection from an alias IP itself, it must always use its own address then. The ping forced the use of the alias, which is a violation.
Hope that is clearer now? :)
Thank you for yours precisions.
I will have a complete test and no only test with Ping.
I did more tests today and I would like to have your idea on my result. I think I found the trouble.
I created a complete lab as I understood from your previous post. So I added a VM on windows behind my two firewall and here the results
If I add the IP of the network card of my firewalls as the gateway, the VM can browse internet without any trouble.
If I add the VIP of the CARP protocol as the gateway, nothing works anymore.
If I look deeper, I can see that the ARP resolution is not working. The mac associated to the VIP in my arp table on the VM is 00-00-5e-00-01-01.
If I do a tcpdump on the firewall side, I can see a request for the MAC for the IP of my VIP but the firewall does not answer (it's the master).
If I manually add the real MAC address of the network card of my master firewall in the arp table, everything works.
If I switch off the master interface, the master carp go to the secondary correctly but I need to add the new mac address manually. Once done, everything works perfectly.
So all the issue seems to be related to resolution of the MAC address at the first time. The strange thing is if I use IP Alias I don't have this issus and the ifconfig seems to be okay.
lagg0_vlan1001: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 00:90:fa:9d:29:d8
inet6 fe80::290:faff:fe9d:29d8%lagg0_vlan1001 prefixlen 64 scopeid 0x15
inet 192.168.111.2 netmask 0xffffff00 broadcast 192.168.111.255
inet 192.168.111.3 netmask 0xffffff00 broadcast 192.168.111.255 vhid 1
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 1001 parent interface: lagg0
carp: MASTER vhid 1 advbase 1 advskew 100
Any idea for me ?
Thank you !
Hi Romain,
I haven't followed all parts of the conversation, but your testing with a setup like this now?
[vm, client] --> [master, vip] --> outside world
[slave, vip] --> outside world
And are pinging from the client to the vip of your CARP setup? Normally this should work, your clients arp table should see the carp reserved mac address in it's arp table.
The mac addresses are managed by CARP and you should never need to update those addresses yourself.
Just to be sure, there are no other machines using something like VRRP in the same network?
Given the complexity of your setup (vlan, lagg, etc), I would really advice you to build a simple test setup first and then extend it step by step to determine what part of your solution is causing your issue.
Cheers,
Ad
This is the exactly the situation I test.
[vm, client] --> [master, vip] --> outside world
[slave, vip] --> outside world
I didn't test the ping but the rest works (browsing internet, DNS resolution...).
Nope only the firewall are using CARP. There is no VRRP configured anywhere on my architecture
The thing is the arp resolution seems not working. The mac address associated to the VIP address stay 00-00-5e-00-01-01 on the VM as it does not receive any answer to it first arp request.
If I add manually the MAC address of the master network carp, everything is working.
I know that my setup is complicated but I can't undo everything and test it. I can't start from begining as I have some service in production. Sorry :-(
Does my switchs should be VRRP aware or I can use a basic switch ?
I saw that there is 4 modes of load balancing in the Carp protocol.
Is there anyway that I can manage which mode is activated ? it seems to be activated by default on arp but for some environment ip or ip-stealth can be better.
Thank you
I have new news. Everything is okay on OPNsense.
I contacted the technical support of the network card and they compile me a new driver without these bug.
Thank again for your help !
Wee, how cool is that. Thanks for reporting back on this. :)