I have a setup with two WAN uplinks, and I've had routing/firewall setup with two gateway groups to split traffic and support failover. It looks like this has broken again since the upgrade?
What I have is on OPT1, which is where all my LAN traffic comes into, I have two firewall rules, in this order in the GUI:
- Incoming to OPT1, destination to an IP group (set of destination IP's) => gateway group 2
- Incoming to OPT1, destination to anywhere => gateway group 1
I made sure to check the gateway groups that they're prioritized correctly.
What I'm seeing is the first rule doesn't seem to be hit, which it was pre-upgrade. Now what is perplexing is this did work on the initial upgrade, but since the hotfix to 24.7_9, it appears broken. I've tried disabling the rules and re-enabling them, moving them around, all with "Reapply" in between. I've also tried rebooting, nothing is working.
Curious if anyone else is seeing similar issues and have any ideas how to resolve.
Further information:
I went ahead and disabled my one "main" gateway in the settings, System => Gateways => Configuration, and applied it. I saw my secondary gateway become active, and the gateway disappeared from the active gateways view. Even though it was disabled, traffic is still being routed to it no matter what. This is really confusing, like the UI is completely ignoring the gateway state and is just routing to the one with a higher priority metric, even if it's disabled.
I ended up trying another thing by disabling the interface for the "main" gateway (disabled the port). After doing that and re-enabling the interface, it seems my multi-wan is working again for now.
This is still broken. I had one of the WAN links fail overnight (this is not uncommon) and the multi-WAN setup properly failed things over to my primary. But it refuses to fail back, and is routing 100% of traffic now out of the primary, and ignoring the firewall rules.
I'm seeing this as well after a recent upgrade to 24.7. It wasn't an issue on 24.1.
I just performed the most recent upgrade to get up to 24.7.1. This issue still remains where the multi-wan setup just doesn't work, and the higher-metric gateway is always chosen no matter what.
I'm happy to try a patch or anything to help get this fixed.
I can confirm that this is happening... I'm not sure its always sticking to the lowest numbered gateway though... Mine failed from primary (252) to secondary (253)... After primary was back up there was nothing I could do to push traffic back to it.. even when disabling the secondary (253) gateway traffic still flowed through secondary (253) and not primary (252).
The only way to restore traffic back to the primary (252) gateway was to reboot... This was definitely introduced in 24.7.
And yes, all changes were "Applied"... I even re-started the interface multiple times.
In addition, for whatever reason, when multiple gateways are enabled, sometimes after reboot they show down on the dashboard, and the only way to get them in an "UP" status is to edit the gateway, change nothing, and apply.
One final note, with multi wan on on 24.1 and starlink you needed "Disable Host Route" checked to be able to use gateway monitoring. On 24.7 Disable Host Route must be UNCHECKED. It doesnt seem to matter for the xfinity/comcast (primary) gateway.
I've been able to replicate all the above amongst multiple sites with the same multi-wan setups.
I have a similar, related issue I believe. VPN network similar to the setup at https://github.com/FingerlessGlov3s/OPNsensePIAWireguard - some containers on proxmox are assigned to the VpN only network, but still accessible by other hosts on the main internal network. 24.7 broke all routes to the VpN subnet , haven't been able to restore access.
Same here - multiwan load balancing not working since 24.1.10_9. Upgrading to 24.7 did not solve the issue. Even on 'load balancing' set up, behavior is like "failover" set up meaning traffic will only route thru' active WAN.
I copy that
Quote from: apunkt on August 16, 2024, 10:45:28 AM
I copy that
i just found this out this morning
i was noticing zero traffic on a tunnel and wondered why...
and yep it was all going out Wan.
i had a tier 1 and teir 2 and those were entirely ignored
Is there any update on this problem? I upgraded to 24.7 about a week ago, and I learned today that my multi-wan setup with a gateway group no longer works.
I have two gateway interfaces--one is called WAN and the other WAN2. I have them in a gateway group, with WAN being the primary gateway and WAN2 the secondary gateway that is only supposed to be used if WAN fails.
Today, the secondary gateway, WAN2, went down due to an outage at the ISP, and I lost internet in my house even though the primary gateway, WAN, was still active and able to send and receive packets.
After WAN2 came back online, I duplicated the problem manually by yanking the cable on WAN2, and again, I lost internet in the house.
Interestingly, when the primary gateway, WAN, fails, then failover to WAN2 happens as it should, but when WAN2 fails, there is a loss of internet entirely. All of this worked in 24.1
I then searched and found this thread. Unfortunately, I do not see any solution here. Unless there is one, I'm going to downgrade to 24.1, which I can do easily because I run OPNsense in a VM on Truenas SCALE and can roll back to a snapshot. Of course, downgrading is not my first choice since I like the new dashboard in 24.7, but it's more important to have multi-wan working.
I have no workaround so far from anyone, and I've not heard or seen any mention that it's being worked on to fix.
I'm unfortunately not brave enough to try a downgrade, even though my config is backed up in a few places. I can't afford extended downtime since I work remotely. But I rely on the ability to split my traffic between both WANs, since I push my work traffic over one uplink, and the rest of 5he house over the other. I can't do that now.
This was broken at one point in 24.1 as well, and then an update fixed it shortly before the 24.7 upgrade was released. I'm hoping a dev sees this and knows what needs to happen, and an update pops out really soon.
Thanks. I have downgraded to version 24.1.10_8, and multi-WAN works properly again. After the downgrade, I yanked the cable into the modem for each gateway respectively, and OPNsense properly failed over to the other gateway with uninterrupted internet.
Hopefully, this problem will get fixed at some point. and I will then upgrade again.
Trying a bump on this since even after the recent 24.7.2 updates, this is still not working. I don't know if this has something to do with the second WAN uplink having a higher metric or not, but this setup works/worked pre-big upgrade, and still does not. I really am hesitant to downgrade, since recreating my config if a restore doesn't work seems a bit terrifying to me.
I'm really open to trying any patches, desk builds, command-line hacks, anything, to try and get this working again. Any help is greatly appreciated.
For some reason, I have found nothing about this issue except this thread. It definitely worked prior to the upgrade to 24.7, and absolutely does not in 24.7, at least when I tried it last week.
As noted, I downgraded to 24.1.10, and it's back to working, but I was able to do so by rolling back to a snapshot.
One tip: If you downgrade manually to 24.1.10, make sure you have a config file ready that was created in 24.1.10 or earlier. At least in most similar setups (and I assume OPNsense is the same way), restoring from config only works if the config file was created from the same or earlier version to which it's restored.
Of course, downgrading is only a temporary solution. It's not feasible to remain with 24.1.10 permanently, so hopefully there is some interest in a workaround or patch in 24.7 for this, because it's beyond my technical skills to fix it on my own.
This recent post may shed some light on this issue: https://forum.opnsense.org/index.php?topic=42552.0.
If WAN cannot ping remote hosts in 24.7, that could explain why gateway monitoring is broken.
For those of you who have 24.7 installed (as noted, I rolled back to 24.1.10 due to this problem), I would suggest manually attempting to ping from each public-facing interface (WAN, WAN2, etc.) to 8.8.8.8 or some other remote host to determine if that's the source of the problem.
Quote from: patrick3000 on August 30, 2024, 12:12:00 AM
For some reason, I have found nothing about this issue except this thread. It definitely worked prior to the upgrade to 24.7, and absolutely does not in 24.7, at least when I tried it last week.
As noted, I downgraded to 24.1.10, and it's back to working, but I was able to do so by rolling back to a snapshot.
One tip: If you downgrade manually to 24.1.10, make sure you have a config file ready that was created in 24.1.10 or earlier. At least in most similar setups (and I assume OPNsense is the same way), restoring from config only works if the config file was created from the same or earlier version to which it's restored.
Of course, downgrading is only a temporary solution. It's not feasible to remain with 24.1.10 permanently, so hopefully there is some interest in a workaround or patch in 24.7 for this, because it's beyond my technical skills to fix it on my own.
Well, here's my thread about it:
https://forum.opnsense.org/index.php?topic=42330.msg208973#msg208973
Sent from my SM-S916B using Tapatalk
Quote from: patrick3000 on August 30, 2024, 01:04:49 AM
This recent post may shed some light on this issue: https://forum.opnsense.org/index.php?topic=42552.0.
If WAN cannot ping remote hosts in 24.7, that could explain why gateway monitoring is broken.
For those of you who have 24.7 installed (as noted, I rolled back to 24.1.10 due to this problem), I would suggest manually attempting to ping from each public-facing interface (WAN, WAN2, etc.) to 8.8.8.8 or some other remote host to determine if that's the source of the problem.
Interesting. That looks like if the WAN link is down, that once it's back up for real, that it can't detect and get things back up. My situation is a bit different I think.
My setup is two WAN uplinks, say WAN1 and WAN2. I have two Gateway groups defined, say Group1 and Group2. Group1 has WAN1 as the Tier 1, WAN2 as Tier 2. Group2 has WAN2 as Tier 1, WAN1 as Tier 2. In my firewall rules, I have something like this:
- From anywhere internally to specific destination IP (work): use Group2
- From anywhere internally to anywhere: use Group1
Then if either WAN link fails, it should fail over correctly.
What is broken for me after the upgrade is that first rule refuses to push traffic over WAN2 when both WAN uplinks are running just fine, and reported as Up as well. It's almost as if the routing metric (where WAN1 is a higher priority) is being applied versus the Gateway group Tiering. The only way I can get my work traffic onto WAN2 is to disable WAN1 altogether, and then restart my work VPN tunnels to stick on WAN2. Then I bring WAN1 back online, and we're good until something bounces again.
That setup is what broke after the upgrade. At one point in time, this exact setup *did* break on 24.1 at one point, and then a subsequent update fixed it. Then 24.7 came along, and it's completely broken again.
I finally figured out what is going wrong here. I ended up looking at the firewall rules themselves via the cmdline, and saw there was a new catch-all rule on my LAN interface that matched and directed all packets to the default gateway, which in this case would be the higher-priority metric out of my two WAN links.
Looking in the GUI, I found a new hidden sshlockout rule that seems to have been added during the upgrade that I did not have on that interface prior to the upgrade. It was the !sshlockout that matched everything inbound from my LAN net, and going anywhere. It was before my rules that split the traffic between my WAN2 and WAN1 (work and everything else, respectively).
I ended up keeping the !sshlockout rule, but modified it for a destination of LAN net as well (keep local traffic inbound open). I don't need the sshlockout enabled, since I have no external login inbound from a WAN interface.
Anyways, this is now working. I did verify I can fail over and fail back correctly between my tier1 and tier2 gateways. Apologies that I didn't find this sooner, but I hope this helps anyone else with a multi-WAN setup to get it working post-upgrade.
Good job, PJW, on solving this problem. However, as of the latest update to OPNsense, I do not believe that your solution, which involves editing hidden LAN firewall rules, is necessary.
In particular, when I first ran into this problem of multi-gateway rules being broken after upgrading to 24.7 several weeks ago, I downgraded to 24.1 as a workaround.
Today, I again upgraded to 24.7, and immediately after the upgrade, the !sshlockout rule you mentioned appeared as a hidden LAN firewall rule. However, after that, I updated to the latest version as of today, which is 24.7.3_1, and when I looked in the LAN firewall rules observed that the !sshlockout rule was gone. So it appears that this problem has been addressed in firmware in the latest version.
Next, I yanked the cable from, in turn, the WAN and WAN2 interface, and failover to the other interface in the gateway group occurred properly. So, it seems that this problem, while it existed in the original release of 24.7 due to the problematic !sshlockout rule, no longer exists in 24.7.3_1.