Hi everyone,
Around two months ago, our company made the switch from Sophos to OPNsense, and we've already deployed around ten devices so far.
Overall, I'm still confident in this decision and appreciate many of the benefits OPNsense brings. However, there's one major issue that keeps coming up, and I'd love to hear how others are handling it: WAN failover and especially failback.
The core of the problem is that when our primary internet connection goes down, several services stop working for an extended period. It seems that devices or applications are still trying to use the old (and now unreachable) gateway. Things get even worse when the primary connection comes back online, as sessions sometimes remain stuck or get duplicated.
This issue is especially critical with VoIP systems. In some cases, the media stream drops entirely and only recovers once we manually reset the states.
What I'm really missing is a built-in feature that cleanly removes all states/sessions tied to the primary WAN when it goes offline – and similarly clears sessions from the backup connection when the primary is restored.
Is there really no native solution for this? I can't be the only one facing persistent issues due to this behavior.
In the meantime, I've been using the following workaround script I found online:
https://github.com/opnsense/core/issues/6803#issuecomment-2048267972
So my question is: Is there any plan to include proper failover/failback session handling in the Business Edition in the near future?
The number of issues caused by the current behavior is starting to add up, and I'd really like to avoid the risk of our management reconsidering our move to OPNsense.
Hi,
For "failover" the state kill on down was added to 25.1.4 so it's also in 25.4.
For "failback" we are discussing options but the requirements are difficult to agree on, see https://github.com/opnsense/core/issues/6803 for some details.
That being said the "monitor" rc.syshook facility allows to plug scripts into the installation to take care of one-shot local requirements away from the complexity of gateways and gateway groups entanglements and edge cases.
https://github.com/opnsense/core/blob/master/src/etc/rc.syshook.d/monitor/20-recover
All callers will get the relevant arguments of gateways that transitioned and you can look up if they are down or up and do appropriate things to it.
Cheers,
Franco
PS: I'd rather leave the thread here because of the recent feature additions as mentioned.
Okay, then I have another question.
Why can't I update to 25.x? :D
I just see that at least the failover issue has been resolved there.
I'm using a business version and my mirror is Deciso from the Netherlands.
I see 25.4 in the changelog, but when I click Update, the latest version is 24.10.2_8.
Is there any way I can perform the update?
Install 24.10.2_8 which adds the hint for the system that 25.4 is available.
Cheers,
Franco