multi-wan failover problem

Started by hescominsoon, August 11, 2022, 09:29:11 PM

Previous topic - Next topic
in the previous release i could unplug one wan port and the system would fail over to number 2 without issue.  When the primary was restored it would fail back to the primary.  This is now NOT happening in the newest release.  This has apparently been a bug before..but now short of rebooting hte firewall it will not restore states back to the primary after a fail over to secondary.  any ideas?

I actually plan setting this configuration up over the weekend.  Will post here if I run into the same issue.
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

so after further testing:

this is on official hardware the dec3850.
If we either pull the primary wan connection physically or disable it in the web gui failover to the secondary takes 5 seconds.  It used to be instant.  Another wrinkle is when the primary is restored it refuses to switch back,  Hitting save 0n an interface has no effect.  Disabling the secondary causes a 5 second loss of connectivity.  Otherwise a reboot is required.

I assume the monitor IP's are setup correctly so that it knows to switch back?  And you set the thresholds?
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

everything worked correctly in the previous version.  only upon upgrading to 22.7 did it break.

August 12, 2022, 06:34:19 PM #5 Last Edit: August 12, 2022, 06:38:12 PM by tcpip
I also ran into this issue. Try setting static routes for the monitored IPs via the corresponding gateway. This solved it for me.

i'm not familiar with that..what parameters in the static route would i use?

Quote from: hescominsoon on August 12, 2022, 11:33:47 PM
i'm not familiar with that..what parameters in the static route would i use?
so set a static route on each gateway to the monitoring ip addresses if i am reading this correctly....

trying to figure out why this is not working..:)

I think I found the issue.
Wan1 has a route set to the monitoring ip of 1.1.1.1 AND the ip of 8.8.8.8 3even though 8.8.8.8. is cearly seutp for monitoring on wan2.  Looks like a bug either in freebsd or the opnsense code.


duh forgot the cidr notation..got it.

August 13, 2022, 01:10:20 AM #11 Last Edit: August 13, 2022, 01:15:47 AM by tcpip
I think the issue is that the route for the monitoring IP of the WAN link gets removed as soon as the link is down. Therefore the monitoring checks don't work anymore. At least this is the case when I disconnect my primary WAN link. Setting the routes manually seems to be a decent workaround. However, I agree that it looks like a bug. I did't find time yet to dig deeper into the issue and file an issue on Github.

How is your multi WAN setup configured? Do you just use gateway switching or employ the gateway groups? Keep in mind that switchting back from WAN2 to WAN1 does not force all existing connections to switch back. The pf states are kept.

gateway groups as failover.  what's weird is in 22.1 it would fail bac to the primary ip after ab out a minute.  i didn't have to do anything.  in 22.7 i now have to either forcibly disable the secondary wan or reboot the firewall for it to fal back.  if this non-going back to the primary is expected behavior..this is not the solution for me and my clients and will have to go back to another product.

Quote from: hescominsoon on August 13, 2022, 03:20:50 AM
gateway groups as failover.  what's weird is in 22.1 it would fail bac to the primary ip after ab out a minute.  i didn't have to do anything.  in 22.7 i now have to either forcibly disable the secondary wan or reboot the firewall for it to fal back.  if this non-going back to the primary is expected behavior..this is not the solution for me and my clients and will have to go back to another product.

we installed 22.1 back on the appliance and restored the config.  It now reverts back to primary within seconds as verified by ipchicken on a desktop behind the opnsense.  No need for a static route eithe.  I think 22.7 needs a ton of work at this point.

August 13, 2022, 05:06:29 PM #14 Last Edit: August 13, 2022, 05:08:09 PM by ProximusAl
Interesting. I'm new to OPNSense and started with 22.7 beta.

You can see in my post here how I deal with the failover back to primary.

https://forum.opnsense.org/index.php?topic=29749.0

I just assumed OPNSense never did it, but did think it strange.

Maybe I should have started with 22.1, but I guess if I did that, I'd have the same issue as you (Technically still do)

EDIT: I should clarify "NEW" connections do use the primary ip when it's back, but OPNSense itself is reluctant to fail ack, hence why I down the interface