OPNsense Forum

English Forums => High availability => Topic started by: bbtaeuber on May 10, 2021, 09:34:19 am

Title: Default gateway lost on CARP master after failback #4977
Post by: bbtaeuber on May 10, 2021, 09:34:19 am
Hi everybody,

this is a thread made to discuss the issue #4977 (https://github.com/opnsense/core/issues/4977).

Our situation:
We have a small cluster of two identical opnsense boxes.
Several interfaces have virtual IP addresses.
On the WAN interface we have one official /30 virtual IPv4 address.
That's the reason we configured private IPv4 addresses statically to the interface (10.A.B.C/24).
It seems necessary to have IPv4 addresses configured to the WAN interfaces for the CARP to work correctly. (Why?)

The problem:
When the master node (A) loses the connection the backup node (B) takes over all the virtual IPs as expected.
When the node A receives the connection again it becomes master again but removes the default route from the WAN interface.

A solution(?):
The suggestion of my colleague now is, that this is because we didn't made the hook on the "Far Gateway" setting for this gateway. The default route is removed, because there is only a private IPv4 address fixed on the WAN that is in a different net than the gateway.
The check for the gateway being on a net that is reachable by the interface seems to be made before the virtual IP is bound to the interface. (Maybe this could be changed?)

We will test, if the hook on "Far Gateway" has an effect in our sense, tomorrow.

Thanks and best regards,
Lars
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: mimugmail on May 10, 2021, 09:40:30 am
Far gateway should help, or place a router/modem infront and run private IPs und the OPNsense where the router portforward all ports to the the HA IP internally
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: kensan on May 10, 2021, 08:25:48 pm
i our case the default gateway and the wan ip (virtual or not) are in the same subnet.
One way I found to trigger the loss on the backup is to  play with both buttons to disable CARP.
I have to do that every time it goes from MASTER->BACKUP as it is the only way I found to make it stop using the the virtual WAN ip to communicate with the outside world.
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: bbtaeuber on May 11, 2021, 07:42:37 am
Far gateway should help, or place a router/modem infront and run private IPs und the OPNsense where the router portforward all ports to the the HA IP internally

It didn't help.

The modem or an additional router is not an option for us, because the firewall cluster being the router is a requirement. This FW cluster is for a local network of about 1000 devices connected to an ISP via 1GB ethernet.

Another strange thing ist, that when I reset the "IPv4 Upstream Gateway" from a manual setting to "Auto-detect" it works once. But only once.

I'm willing to send more config to get this bug fixed. Which information is needed therefore?

For the next 12 days I'm on a vacation.

Cheers,
Lars
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: mimugmail on May 11, 2021, 11:26:24 am
Screenshots of Interface, CARP VIP and Gateway please
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: bbtaeuber on May 11, 2021, 12:25:56 pm
first screenshots
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: bbtaeuber on May 11, 2021, 12:26:17 pm
next screenshot
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: bbtaeuber on May 11, 2021, 12:26:41 pm
last screenshots
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: bbtaeuber on May 27, 2021, 08:55:44 am
Had someone time to look into this?
I'm back from my vacation.

Thanks,
Lars
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: no1fuxwithtux on June 01, 2021, 08:02:29 pm
Hi guys,

I can completly reproduce the problem. It was also present in pfSense and seamingly fixed at some point, but started occuring again there aswell. https://redmine.pfsense.org/issues/8465 I bet this is a upstream bug from FreeBSD. This is really simple to reproduce. Just set an upstream gateway not directly reachable by the interfaces of the firewalls, but instead only reachable by the carp ip subnet on these interfaces. As soon as you disable carp on the master or the slave the gateway is lost. It also does not matter if you set far gateway or not. The funniest thing is applying the gateway config again seems to solve the problem until a reboot. I also don't understand why this problem is not top priority. This is a serious issue that probably a lot of people never noticed, because they did not check the routing table and or did not test ha properly and just deployed it. :o Not everyone has 2 WAN IP addresses to waste for ha...  I think I will try to reproduce the problem on a raw FreeBSD install and try to reproduce it there to report it to the FreeBSD guys. I bet it will happen aswell.
Title: Re: Default gateway lost on CARP master after failback #4977
Post by: bbtaeuber on June 02, 2021, 07:34:12 am
Thanks no1fuxwithtux for testing with raw FreeBSD and the time you invest.

Admins, please give it a higher priority.
HA was one of the more important arguments to choose OPNsense.

Title: Re: Default gateway lost on CARP master after failback #4977
Post by: no1fuxwithtux on June 12, 2021, 02:39:18 pm
Hello everyone,

I tested again with raw FreeBSD and I can't reproduce the issue there. So there must be something with the gateway code of OPNsense that causes the loose of the default gateway.
I configured two FreeBSD machines with 3 networks each. I did not actually use vtnet1, but added it anyways, if I want to test state sync later.

The following is set in rc.conf:

router1:
hostname="freebsd-router1"
keymap="de.kbd"
sshd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
#networking stuff
#Management-Interface/LAN
ifconfig_vtnet0="inet 192.168.122.100 netmask 255.255.255.0"
#PFSYNC
ifconfig_vtnet1="inet 192.168.1.1 netmask 255.255.255.0"
#WAN
ifconfig_vtnet2="inet 192.168.2.1 netmask 255.255.255.248"
ifconfig_vtnet2_alias0="inet vhid 1 pass testpass alias 192.168.3.1/29"

#routes
#defaultrouter="192.168.3.4"
static_routes="default"
route_default="-net 0.0.0.0 192.168.3.4"

router2:
hostname="freebsd-router2"
keymap="de.kbd"
sshd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
#networking stuff
#Management-Interface/LAN
ifconfig_vtnet0="inet 192.168.122.83 netmask 255.255.255.0"
#PFSYNC
ifconfig_vtnet1="inet 192.168.1.2 netmask 255.255.255.0"
#WAN
ifconfig_vtnet2="inet 192.168.2.2 netmask 255.255.255.248"
ifconfig_vtnet2_alias0="inet vhid 1 advskew 100 pass testpass alias 192.168.3.1/29"

#routes
#defaultrouter="192.168.3.4"
static_routes="default"
route_default="-net 0.0.0.0 192.168.3.4"

No matter how hard I try to break the default route, the problem does not exist in raw FreeBSD.
Similiar to the OPNsense-setup I disabled carp on the primary by deactivating the whole interface:
ifconfig vtnet2 down
after that i made sure that carp is indeed disabled:
output from ifconfig vtnet2:
vtnet2: flags=8902<BROADCAST,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
   options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
   ether 52:54:00:7b:a5:0a
   inet 192.168.2.1 netmask 0xfffffff8 broadcast 192.168.2.7
   inet 192.168.3.1 netmask 0xfffffff8 broadcast 192.168.3.7 vhid 1
   carp: INIT vhid 1 advbase 1 advskew 0
   media: Ethernet 10Gbase-T <full-duplex>
   status: active
   nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
As you can see carp is in INIT state.
After looking at the router2 it was indeed in master state, but the default route was still there on both systems:
Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.3.4        UGS      vtnet2
127.0.0.1          link#4             UH          lo0
192.168.1.0/24     link#2             U        vtnet1

I also tried spamming ifconfig vtnet2 vhid1 state backup to somehow provoke bad behaviour, but there was not success. :o
Then I tried setting the default route in different ways in rc.conf like adding the option defaultrouter, but this also works without an issue.
The OPNsense-Team will definitely have to look in their gateway-code to solve this issue.