[SOLVED] Multi-gateway rules not working after 24.7 upgrade

Started by pjw, August 01, 2024, 06:59:00 PM

Previous topic - Next topic
This recent post may shed some light on this issue: https://forum.opnsense.org/index.php?topic=42552.0.

If WAN cannot ping remote hosts in 24.7, that could explain why gateway monitoring is broken.

For those of you who have 24.7 installed (as noted, I rolled back to 24.1.10 due to this problem), I would suggest manually attempting to ping from each public-facing interface (WAN, WAN2, etc.) to 8.8.8.8 or some other remote host to determine if that's the source of the problem.

Quote from: patrick3000 on August 30, 2024, 12:12:00 AM
For some reason, I have found nothing about this issue except this thread. It definitely worked prior to the upgrade to 24.7, and absolutely does not in 24.7, at least when I tried it last week.

As noted, I downgraded to 24.1.10, and it's back to working, but I was able to do so by rolling back to a snapshot.

One tip: If you downgrade manually to 24.1.10, make sure you have a config file ready that was created in 24.1.10 or earlier. At least in most similar setups (and I assume OPNsense is the same way), restoring from config only works if the config file was created from the same or earlier version to which it's restored.

Of course, downgrading is only a temporary solution. It's not feasible to remain with 24.1.10 permanently, so hopefully there is some interest in a workaround or patch in 24.7 for this, because it's beyond my technical skills to fix it on my own.
Well, here's my thread about it:

https://forum.opnsense.org/index.php?topic=42330.msg208973#msg208973

Sent from my SM-S916B using Tapatalk


Quote from: patrick3000 on August 30, 2024, 01:04:49 AM
This recent post may shed some light on this issue: https://forum.opnsense.org/index.php?topic=42552.0.

If WAN cannot ping remote hosts in 24.7, that could explain why gateway monitoring is broken.

For those of you who have 24.7 installed (as noted, I rolled back to 24.1.10 due to this problem), I would suggest manually attempting to ping from each public-facing interface (WAN, WAN2, etc.) to 8.8.8.8 or some other remote host to determine if that's the source of the problem.

Interesting.  That looks like if the WAN link is down, that once it's back up for real, that it can't detect and get things back up.  My situation is a bit different I think.

My setup is two WAN uplinks, say WAN1 and WAN2.  I have two Gateway groups defined, say Group1 and Group2.  Group1 has WAN1 as the Tier 1, WAN2 as Tier 2.  Group2 has WAN2 as Tier 1, WAN1 as Tier 2.  In my firewall rules, I have something like this:

- From anywhere internally to specific destination IP (work): use Group2
- From anywhere internally to anywhere: use Group1

Then if either WAN link fails, it should fail over correctly.

What is broken for me after the upgrade is that first rule refuses to push traffic over WAN2 when both WAN uplinks are running just fine, and reported as Up as well.  It's almost as if the routing metric (where WAN1 is a higher priority) is being applied versus the Gateway group Tiering.  The only way I can get my work traffic onto WAN2 is to disable WAN1 altogether, and then restart my work VPN tunnels to stick on WAN2.  Then I bring WAN1 back online, and we're good until something bounces again.

That setup is what broke after the upgrade.  At one point in time, this exact setup *did* break on 24.1 at one point, and then a subsequent update fixed it.  Then 24.7 came along, and it's completely broken again.

I finally figured out what is going wrong here.  I ended up looking at the firewall rules themselves via the cmdline, and saw there was a new catch-all rule on my LAN interface that matched and directed all packets to the default gateway, which in this case would be the higher-priority metric out of my two WAN links.

Looking in the GUI, I found a new hidden sshlockout rule that seems to have been added during the upgrade that I did not have on that interface prior to the upgrade.  It was the !sshlockout that matched everything inbound from my LAN net, and going anywhere.  It was before my rules that split the traffic between my WAN2 and WAN1 (work and everything else, respectively).

I ended up keeping the !sshlockout rule, but modified it for a destination of LAN net as well (keep local traffic inbound open).  I don't need the sshlockout enabled, since I have no external login inbound from a WAN interface.

Anyways, this is now working.  I did verify I can fail over and fail back correctly between my tier1 and tier2 gateways.  Apologies that I didn't find this sooner, but I hope this helps anyone else with a multi-WAN setup to get it working post-upgrade.

September 05, 2024, 12:47:48 AM #19 Last Edit: September 05, 2024, 06:47:56 AM by patrick3000
Good job, PJW, on solving this problem. However, as of the latest update to OPNsense, I do not believe that your solution, which involves editing hidden LAN firewall rules, is necessary.

In particular, when I first ran into this problem of multi-gateway rules being broken after upgrading to 24.7 several weeks ago, I downgraded to 24.1 as a workaround.

Today, I again upgraded to 24.7, and immediately after the upgrade, the !sshlockout rule you mentioned appeared as a hidden LAN firewall rule. However, after that, I updated to the latest version as of today, which is 24.7.3_1, and when I looked in the LAN firewall rules observed that the !sshlockout rule was gone. So it appears that this problem has been addressed in firmware in the latest version.

Next, I yanked the cable from, in turn, the WAN and WAN2 interface, and failover to the other interface in the gateway group occurred properly. So, it seems that this problem, while it existed in the original release of 24.7 due to the problematic !sshlockout rule, no longer exists in 24.7.3_1.