OPNsense Forum

Archive => 18.7 Legacy Series => Topic started by: drivera on October 28, 2018, 07:39:19 am

Title: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 07:39:19 am
Hi!

I have a dual-WAN setup which I've successfully configured OPNSense to handle.  I can pull the plug on the primary LAN and the secondary will kick in.... eventually.

And therein lies the rub: for some reason, OPNSense takes its sweet time to fail over even when it detects that the primary LAN is LINK_DOWN (i.e. cable disconnect).

So my first question is: how do I make OPNSense react more quickly ("decisively"?) to a failover event?

To add insult to injury, the primary WAN is a Cable (DOCSIS 3) ISP which, during bootup, supplies two IPs to the client machine: a "private" (RFC-1918) one and then, subsequently, the final public IP.  My problem is, of course, that OPNSense detects the first IP and thus assumes that the circuit should be brought back online (after all, it's LINK_UP and has an IP... so who can blame it?) when in reality it should wait until the public IP has been assigned.

I don't know if/how OPNSense would be configured for that...

So.... help?
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: va176thunderbolt on October 28, 2018, 08:37:48 am
You can configure your WAN interfaces to ignore the dhcp offered from the cable modem itself. Under “interfaces”, “WAN” enter 192.168.100.1 (replace with your cable modems is if different) in the “Reject leases from” field under the dhcp client area. This will stop your interface from getting the rfc1918 address and thinking the interface is up and ready for traffic.
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 08:40:14 am
Won't this interfere with connection setup?

Worth a shot I guess... Thanks!
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 09:00:51 am
Also, how about accelerating the failover/recovery? Any suggestions on that?
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: mimugmail on October 28, 2018, 09:09:29 am
System : Gateways : Gateway : Adavanced
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 09:20:40 am
I tried that earlier but didn't see much effect - maybe I fiddled with it too much. I'll try a simpler configuration.

Thanks!
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 11:49:59 am
And now it seems failover doesn't work at all. It appears apinger isn't working - just keeps failing on signal 15:

Code: [Select]
Oct 28 04:46:57 apinger: Starting Alarm Pinger, apinger(27275)
Oct 28 04:46:57 apinger: Exiting on signal 15.
(... ad infinitum ...)

Poking around I can see that the apinger.conf file is identical to the default one.  I found some other threads of other dudes having a similar issue. Their solution was to re-save (without necessarily changing anything) the gateways individually, and things would start working again.

That hasn't worked for me.

Monitoring the /usr/local/etc/apinger.conf file I can see that no matter what I change in the UI, the file never changes.  Perhaps something is broken beyond repair on this install?

This might be the reason failover/failback isn't working...right?

.....help?
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: mimugmail on October 28, 2018, 01:12:09 pm
Firewall : Settings : Advanced : Use Dpinger
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 05:13:06 pm
And here I didn't believe in magic wands :D

Thanks!

Sidenote: why would the apinger config not be updated? This hints to another hidden problem somewhere that would cause that module to fall out-of-sync with the rest of the configuration.  Then again, with such a simple workaround, perhaps it's not worth chasing down just yet?
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: mimugmail on October 28, 2018, 07:38:10 pm
There is a discussion to make dpinger default in 19.1 .. we'll see
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 07:45:39 pm
I'd love to be a fly on the wall for that, so I can learn what's going on with apinger and what would end up being lost if there were a switch over to dpinger.

I'm curious what's going on with apinger, though. The behavior was strange and I'm afraid I was too lazy to try to hunt it down using the debugging tools. Perhaps I'll take the time one of these days to try to figure it out.

Cheers!
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: mimugmail on October 28, 2018, 08:11:22 pm
apinger is old, unmaintained and has problems with multiple gateways as it's not multithreaded.
Waste of time digging into it ...
Title: Re: Multi-WAN failover is SLOW to fail, and SLOW to recover...
Post by: drivera on October 28, 2018, 08:34:44 pm
And that settles it. Thanks for clearing that up!