Multi-WAN router-originated traffic

Started by obrienmd, August 15, 2017, 12:37:04 AM

Previous topic - Next topic
Multi-WAN in OpnSense works perfectly, and makes perfect logical sense, from a LAN-originated traffic perspective:

  • Create the group
  • Create a policy router
  • Traffic goes out per the group's configuration depending on gateway health

However, for router-originated traffic (outbound VPN tunnel connections, outbound DNS queries, updates, etc.) failover doesn't seem to work. I recall being able to target router-originated traffic in floating rules in another FreeBSD-based firewall system, but I could be mistaken.

I'm curious about the deprected "Gateway Switching" feature in Firewall > Settings > Advanced. That seems, from my reading, to suggest it would emulate the behavior of lower-end edge devices with Multi-WAN, which just change their system default gateway based on gateway health. In fact, if that's what it does, it removes policy routing as a requirement, which would be nice.

If this feature works like I expect it to, is there a reason it is deprecated? I suppose one question would be how we determine the priority of the gateways, as we'd want it to swap back to the "primary" when it's healthy again.

In my dream world, the default gateway for the system as a whole could be set as a gateway group, but I could be missing something massive.



There's a separate report recently of failover not working for IPsec tunnels, which seems to confirm what you're seeing. However that may be more complicated, since Strongswan configurations specify the IPs for both ends -- so it would take no just routing, but a configuration change or swap for IPsec to switch over. Which sort of VPN are you seeing fail to failover?

August 15, 2017, 04:37:24 PM #2 Last Edit: August 15, 2017, 04:39:59 PM by franco
The default gateway switching is going away, because it can't force the traffic into IPsec without proper configuration. IPsec security is like that. :D

But, seriously, the default gateway lives in the routing table, all multi-WAN done via policy routes makes this setting useless because it is simply ignored by the packet filter. It's even more useless because there are the default interface gateway policy routes:

We've tried removing these, but that had edge cases where people configure their multi-WAN to rely on static link routes, especially for said local services. We know all the areas that are affected now, but it's difficult to move this "conglomerate of local services expecting fixed routes" to the next level.

In 17.7.1 there is a new manual SPD feature for IPsec which should allow you to include Multi-WAN to span over IPsec. Some also seem to use leftsubnet configurations of 0.0.0.0 to do the same from the IPsec daemon side itself. It depends on the use case (and whether NAT must be used going into IPsec or not).

Going back to the local fixed interface routes, it may be more beneficially to make these globally on-off, so that we could redirect all traffic if need be according to the high level (visible) multi-WAN configuration.

Does that make any sense? I'm not sure myself...


Cheers,
Franco

Leaving IPSec out of it for now - the workaround we usually use is IPSec transport between pairs of connections, with GRE tunnels inside and OSPF for failover. I'm not comfortable enough with the internals of IPSec to really get deep into that use-case.

I was thinking about OpenVPN in this case (and in the future, ZeroTier :)), both of which seem to be more forgiving than the rats nest (useful, performant rats nest) that is IPSec.

One that's hitting us right now is DNS outbound from the router. When the primary link goes down, if OPNSense is the DNS server / forwarder, it can't get out to the servers it relies on. Am I missing a way to policy route this? Same for NTP, etc.

Assuming you've got the gateway group set up per the doc, and that's what's handling the LAN failover, what additional might be required to have the gateway group also used for DNS/NTP/whatever traffic originating from the OPNsense box?

@obrianmd, for dns/ntp issue try this one[1], it sets the fixed interface routes back for local services... it's a double-edged sword as I said:

EDIT: ASSUMING YOU ARE ON 17.7

# opnsense-patch 0b38eff5f
# /usr/local/etc/rc.filter_configure

(rerun again to remove the patch)


[1] https://github.com/opnsense/core/commit/0b38eff5f

@franco To be honest, I'm not sure what you mean by "fixed interface routes for local services". If there is no global default gw switching, how does local outbound DNS traffic know which gateway to go out on?

@whitwye I agree. I'm curious if we could have a rules tab for "local" or be able to select lo as an interface in floating rules.

Another note, perhaps it's incredibly hard and/or impossible given how a gateway group in defined, but allowing usage of a gateway group as the system's default gateway would be awesome.

I expect that "switch default route" uses multiple route tables in bsd (setfib), but again I'm working from near-ignorance and trying to catch up here :)


There is a lot to unpack, it may be better to split off individual topics in separate threads.

Regarding your Multi-WAN failure issue, saying that you have connection issues to upstream DNS / NTP in the failover:

(a) Was this always a problem (on 17.1 and 17.7)? If yes, it could simply be a setup quirk.

(b) Was this ok on 17.1 and not ok on 17.7? Try the patch I suggested.

if_ipsec is nice, we hope to adapt FreeBSD 11.1 for 18.1 given that it does not give us any hard trouble. But the FreeBSD upgrades never go as smoothly as expected. We're kind of at the edge of their use case spectrum, use some combinations of technologies that not even pfSense uses (some we share, some are different).

In any case, there will be a test version for 11.1 to play with if_ipsec under the hood in one month hopefully.

The "interface gateway route" issue is related to (b), some multi-wan setups seem to be half-configured -- incidentally see (a) as well -- is that OPNsense 17.1 pinned local traffic from the firewall using that particular interface route. OPNsense 17.7 no longer does that. This interferes with NTP or DNS when you set it to listen to a specific interface (which is not recommended, but sometimes needed).

So if you only listen/send DNS on interface OPT1, but have a multi-wan over OPT2 and OPT1, that may stop working, because DNS tries to send to the wrong interface, meaning it is not included in multi-wan switching.

The default route switch is a crude piece of code that simply switches to another (non-default) gateway when the default gateway is down. It does that like it would when one does it manually from the console, so it reconfigures the available routing table. In that sense, the firewall gateway rules are far more flexible (and reliable).


Cheers,
Franco

Hi,
  i might have a similar problem. I set up Multi-WAN for failover as described in the How-To using 17.7. Route and interface OPT1 is the default WAN set as Tier 1 (also set as "Default Gateway" in the gateway setting), Route and interface OPT2 is the backup Tier 2. The LTE-router at the backup interface is disabled, so only Gateway OPT1 is online. The "Gateway Switching" feature in Firewall > Settings > Advanced is off.
  All traffic LAN->WAN works as expected, but router-originated traffic does not work (e.g. check for updates does not work).
  Inspecting SYSTEM > ROUTES >STATUS i see the default destination for IPV4 set to OPT2, the disabled (non-default) gateway. Only when i disable the Interface OPT2 at the interface-settings, the default destination changes to OPT1 and router-originated traffic is working again.

best regards, Nikolaus
 
 

Hi Nikolaus,

Please try this, we are cleaning up after a few edge cases from 17.7 with 17.7.1:

# opnsense-patch 0b38eff5f
# /usr/local/etc/rc.filter_configure

(rerun again to remove the patch if not working)


Cheers,
Franco

Hi Franco,
  thank you for your fast reply. Unfortunately the patch did not solve the problem. It made the situation slightly worse, because disabling interface OPT2 did change the default route back to OPT1, but router-originated traffic did not work again. I had to disable gateway OPT2 in addition to disabling the interface to get router-originated traffic work again. Do you need additional information from me? Shall i open a new thread?
best regards, Nikolaus

Is it a good bet that these problems with Multi WAN (the ones here, the ones I've reported in https://forum.opnsense.org/index.php?topic=5765.0) are all of a piece?

August 18, 2017, 10:07:23 PM #14 Last Edit: August 18, 2017, 10:52:51 PM by whitwye
Just experimented with turning off the Tier1 WAN interface. When I did that the Tier2 IPs, that had been working for NATing to a DNS server on the LAN, as well as for providing admin access to OPNsense, stopped working for either use. Brought back the Tier1 WAN, and the Tier2 WANs IPs started working again. Need to work out settings such that Tier2 IPs stay good no matter what the state of Tier1. As is, it doesn't look like the gateway group feature works for either traffic inward via NAT to the LAN, nor for traffic to the firewall itself.

With the interface on WAN1 up:

Quoteroot@OPNsense:~ # route get 207.136.236.70
   route to: vt.[obfuscated].com
destination: default
       mask: default
    gateway: [obfuscated].jfk01.atlas.cogentco.com
        fib: 0
  interface: igb1
      flags: <UP,GATEWAY,DONE,STATIC>
recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0

With it down:

Quote
root@OPNsense:~ # route get 207.136.236.70
route: route has not been found

There should be a route found on WAN2.

OPNsense is deleting the default route from the routing table when the interface goes down. It's not replacing it with a default route via the second WAN interface. Perhaps that's as it should be. But for the second WAN interface to stop being able to respond to traffic incoming to it, as well as for traffic originating on the firewall to find no path out ... not good. The LAN interface is still good, but that's it.