OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: bigops on August 31, 2020, 10:12:27 pm

Title: Gateway issue
Post by: bigops on August 31, 2020, 10:12:27 pm
Hi
Recently I have been noticing a strange behavior on Opnsense.  I have a configuration which has two internet links and the configuration is done to have the first link to have a higher priority than the secondary link.  The traffic will fail-over to the secondary link if there is an issue with the primary link.   But what i have noticed recently is that once OpnSense switches to the secondary link it never falls back to the primary link even though the primary link has been restored and shown online in the GUI.  What is more intriguing is that the route table lists the primary link as active and still all traffic takes the other link.  Any changes to the gateway configs or rebooting OpnSense then switches to the correct gateway.  This is a new behavior noted recently
Title: Re: Gateway issue
Post by: mimugmail on September 01, 2020, 06:17:11 am
Sticky sessions? Policy Routing active?
Title: Re: Gateway issue
Post by: bigops on September 03, 2020, 05:46:23 pm
Sticky sessions was the culprit.  Thanks
Title: Re: Gateway issue
Post by: bigops on September 10, 2020, 03:34:14 pm
Removing the sticky connections seems to have solved the issue.  But isnt sticky connections there for a reason?  Probably there is an issue where the sticky connections timer does not expire as when this is on I do not see a tailback when the primary connection is restored.
Title: Re: Gateway issue (Updated)
Post by: bigops on September 16, 2020, 07:52:30 am
After trying out the various options and observing for more than a week I am fairly certain that the Failback is not working as expected.   When OPNsense fails over to the lower tier gateway even when the Primary connections becomes active without any error the failback does not happen.  Rebooting the device seems to always correct the issue.  Physically removing the primary connection seems to trigger the failback when the connection comes back on.  The issue seems to occur when the failover happens due to a latency  / packet loss issue. 

Looking into the Gateway configuration and the route table everything seems to be fine, but the traffic just does not seem to take the route

I am attaching a few screenshots which shows the issue where the tracert from the client takes a different path vs the one from the OPNsense box itself.   

This issue is causing a lot of headaches.  Anyone has any suggestions?

Thanks

B
Title: Re: Gateway issue
Post by: mimugmail on September 16, 2020, 01:47:58 pm
Maybe it doesnt fail back because the session is still active? Then this would force using still the second Tier to not disrupt connections again
Title: Re: Gateway issue
Post by: bigops on September 16, 2020, 02:53:35 pm
It seems to work fine when the Tier 2 is physically disconnected and failure is simulated.  Also the behavior does not change even if all the sessions are closed (or the client rebooted).   An additional information is that in this setup the clients are behind a Layer 3 device and only a routed link is available between the Layer 3 device and OpnSense
Title: Re: Gateway issue
Post by: bigops on September 22, 2020, 05:17:21 pm
Is this problem being observed by anyone else?  I keep having this problem and nothing seems to be able to resolve it. 

Thanks