Multi-Wan Setup Failback from Tier 2 to Tier 1 unreliable

Started by axsdenied, August 10, 2023, 04:34:38 AM

Previous topic - Next topic
August 11, 2023, 04:38:13 PM #15 Last Edit: August 11, 2023, 04:48:23 PM by axsdenied
Ok well that didn't take long. Had a real event occur minutes after I posted my previous reply.

Still not seeing a full fallback to Tier 1. See image below.  This was taken a few minutes after Tier 1 came back online.  Light green is WAN (Tier 1), Dark green is WAN2 (Tier 2).

I even tried forcing the WAN2 down and it still has traffic routed through it. See 2nd image.

Img 1.


Img 2.
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

Not sure where I got it my head that I needed to be on the development branch to apply patches but I caught my error.  Everything above is and applies to the dev branch.

I've since reverted back to the community branch and have applied the patch to it and will continue to test.
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

> I've since reverted back to the community branch and have applied the patch to it and will continue to test.

So how's that test going?


Cheers,
Franco

So far so good, but I haven't had a chance to simulate it.  Will do this week!

Side question: Did you guys do any memory optimization as well? I've noticed overall usage, with my config, hovering around 2.5GB.  In 23.1 series it would slowly ramp up to 5 to 6GB.
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD


August 20, 2023, 07:55:39 AM #20 Last Edit: August 20, 2023, 08:00:50 AM by axsdenied
Ok I went to simulate a test by marking the gateway as down but nothing shifted.  I can physically unplug the primary WAN to test as well but thought I'd share this.

OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

"force_down" handling previously is a bit difficult to say given its niche value. Monitoring-induced downtimes already work and cable disconnects will work on 23.7.2.

I've added a commit to include force_down for testing as it would make sense to consolidate. If it works we can discuss adding it to 23.7.3.

https://github.com/opnsense/core/commit/7f1d8c66d3


Cheers,
Franco

Upgraded to 23.7.2 and tried simulating a fallback:

Everything fell back smoothly after WAN when down but after it came back up, existing sessions stayed with WAN2 and never went back to WAN.

Should I re-apply the patch and try again?
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

On 23.7.2 there is nothing to reapply.

Do you have sticky connections enabled?


Cheers,
Franco

Sticky connections is not enabled.  Overtime, about an hour or 2 the connections did move over.  Just not immediately.

Is it designed to wait for sessions to end or expire before moving?
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

Yep. Stateful tracking. You can try to experiment with rules that do not keep state (advanced rule settings). It might move over immediately, but it depends on the client liking that or not.


Cheers,
Franco

August 26, 2023, 09:31:08 PM #26 Last Edit: August 26, 2023, 09:37:52 PM by axsdenied
If that's by design, which makes logical sense for greatest session stability, then I had the wrong expectations.

Is there an option to force then back, much like connections are forced when WAN goes down for triggers?  Most of the clients and apps I use respond well to being forced over with the exception of Discord and Hulu (when you have the TV package - they do a IP "home" check. It also seems to never release it's session, or at least that's the behavior it exhibits)
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD