OPNsense Forum

Archive => 17.7 Legacy Series => Topic started by: obrienmd on August 15, 2017, 12:37:04 am

Title: Multi-WAN router-originated traffic
Post by: obrienmd on August 15, 2017, 12:37:04 am
Multi-WAN in OpnSense works perfectly, and makes perfect logical sense, from a LAN-originated traffic perspective:

However, for router-originated traffic (outbound VPN tunnel connections, outbound DNS queries, updates, etc.) failover doesn't seem to work. I recall being able to target router-originated traffic in floating rules in another FreeBSD-based firewall system, but I could be mistaken.

I'm curious about the deprected "Gateway Switching" feature in Firewall > Settings > Advanced. That seems, from my reading, to suggest it would emulate the behavior of lower-end edge devices with Multi-WAN, which just change their system default gateway based on gateway health. In fact, if that's what it does, it removes policy routing as a requirement, which would be nice.

If this feature works like I expect it to, is there a reason it is deprecated? I suppose one question would be how we determine the priority of the gateways, as we'd want it to swap back to the "primary" when it's healthy again.

In my dream world, the default gateway for the system as a whole could be set as a gateway group, but I could be missing something massive.


Title: Re: Multi-WAN router-originated traffic
Post by: whitwye on August 15, 2017, 04:26:27 pm
There's a separate report recently of failover not working for IPsec tunnels, which seems to confirm what you're seeing. However that may be more complicated, since Strongswan configurations specify the IPs for both ends -- so it would take no just routing, but a configuration change or swap for IPsec to switch over. Which sort of VPN are you seeing fail to failover?
Title: Re: Multi-WAN router-originated traffic
Post by: franco on August 15, 2017, 04:37:24 pm
The default gateway switching is going away, because it can't force the traffic into IPsec without proper configuration. IPsec security is like that. :D

But, seriously, the default gateway lives in the routing table, all multi-WAN done via policy routes makes this setting useless because it is simply ignored by the packet filter. It's even more useless because there are the default interface gateway policy routes:

We've tried removing these, but that had edge cases where people configure their multi-WAN to rely on static link routes, especially for said local services. We know all the areas that are affected now, but it's difficult to move this "conglomerate of local services expecting fixed routes" to the next level.

In 17.7.1 there is a new manual SPD feature for IPsec which should allow you to include Multi-WAN to span over IPsec. Some also seem to use leftsubnet configurations of 0.0.0.0 to do the same from the IPsec daemon side itself. It depends on the use case (and whether NAT must be used going into IPsec or not).

Going back to the local fixed interface routes, it may be more beneficially to make these globally on-off, so that we could redirect all traffic if need be according to the high level (visible) multi-WAN configuration.

Does that make any sense? I'm not sure myself...


Cheers,
Franco
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on August 15, 2017, 06:49:12 pm
Leaving IPSec out of it for now - the workaround we usually use is IPSec transport between pairs of connections, with GRE tunnels inside and OSPF for failover. I'm not comfortable enough with the internals of IPSec to really get deep into that use-case.

I was thinking about OpenVPN in this case (and in the future, ZeroTier :)), both of which seem to be more forgiving than the rats nest (useful, performant rats nest) that is IPSec.

One that's hitting us right now is DNS outbound from the router. When the primary link goes down, if OPNSense is the DNS server / forwarder, it can't get out to the servers it relies on. Am I missing a way to policy route this? Same for NTP, etc.
Title: Re: Multi-WAN router-originated traffic
Post by: whitwye on August 15, 2017, 06:57:35 pm
Assuming you've got the gateway group set up per the doc, and that's what's handling the LAN failover, what additional might be required to have the gateway group also used for DNS/NTP/whatever traffic originating from the OPNsense box?
Title: Re: Multi-WAN router-originated traffic
Post by: franco on August 15, 2017, 07:02:58 pm
@obrianmd, for dns/ntp issue try this one[1], it sets the fixed interface routes back for local services... it's a double-edged sword as I said:

EDIT: ASSUMING YOU ARE ON 17.7

# opnsense-patch 0b38eff5f
# /usr/local/etc/rc.filter_configure

(rerun again to remove the patch)


[1] https://github.com/opnsense/core/commit/0b38eff5f
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on August 15, 2017, 07:55:33 pm
@franco To be honest, I'm not sure what you mean by "fixed interface routes for local services". If there is no global default gw switching, how does local outbound DNS traffic know which gateway to go out on?

@whitwye I agree. I'm curious if we could have a rules tab for "local" or be able to select lo as an interface in floating rules.
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on August 15, 2017, 08:17:07 pm
Another note, perhaps it's incredibly hard and/or impossible given how a gateway group in defined, but allowing usage of a gateway group as the system's default gateway would be awesome.

I expect that "switch default route" uses multiple route tables in bsd (setfib), but again I'm working from near-ignorance and trying to catch up here :)
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on August 15, 2017, 08:27:48 pm
Sorry to pack so much in, but if_ipsec looks really interesting: https://www.reddit.com/r/freebsd/comments/6pmdrm/freebsd_111release_announcement_i_saw_this_and_i/
Title: Re: Multi-WAN router-originated traffic
Post by: franco on August 16, 2017, 07:56:05 am
There is a lot to unpack, it may be better to split off individual topics in separate threads.

Regarding your Multi-WAN failure issue, saying that you have connection issues to upstream DNS / NTP in the failover:

(a) Was this always a problem (on 17.1 and 17.7)? If yes, it could simply be a setup quirk.

(b) Was this ok on 17.1 and not ok on 17.7? Try the patch I suggested.

if_ipsec is nice, we hope to adapt FreeBSD 11.1 for 18.1 given that it does not give us any hard trouble. But the FreeBSD upgrades never go as smoothly as expected. We're kind of at the edge of their use case spectrum, use some combinations of technologies that not even pfSense uses (some we share, some are different).

In any case, there will be a test version for 11.1 to play with if_ipsec under the hood in one month hopefully.

The "interface gateway route" issue is related to (b), some multi-wan setups seem to be half-configured -- incidentally see (a) as well -- is that OPNsense 17.1 pinned local traffic from the firewall using that particular interface route. OPNsense 17.7 no longer does that. This interferes with NTP or DNS when you set it to listen to a specific interface (which is not recommended, but sometimes needed).

So if you only listen/send DNS on interface OPT1, but have a multi-wan over OPT2 and OPT1, that may stop working, because DNS tries to send to the wrong interface, meaning it is not included in multi-wan switching.

The default route switch is a crude piece of code that simply switches to another (non-default) gateway when the default gateway is down. It does that like it would when one does it manually from the console, so it reconfigures the available routing table. In that sense, the firewall gateway rules are far more flexible (and reliable).


Cheers,
Franco
Title: Re: Multi-WAN router-originated traffic
Post by: flyniki on August 17, 2017, 12:16:38 am
Hi,
  i might have a similar problem. I set up Multi-WAN for failover as described in the How-To using 17.7. Route and interface OPT1 is the default WAN set as Tier 1 (also set as "Default Gateway" in the gateway setting), Route and interface OPT2 is the backup Tier 2. The LTE-router at the backup interface is disabled, so only Gateway OPT1 is online. The "Gateway Switching" feature in Firewall > Settings > Advanced is off.
  All traffic LAN->WAN works as expected, but router-originated traffic does not work (e.g. check for updates does not work).
  Inspecting SYSTEM > ROUTES >STATUS i see the default destination for IPV4 set to OPT2, the disabled (non-default) gateway. Only when i disable the Interface OPT2 at the interface-settings, the default destination changes to OPT1 and router-originated traffic is working again.

best regards, Nikolaus
 
 
Title: Re: Multi-WAN router-originated traffic
Post by: franco on August 17, 2017, 11:26:56 am
Hi Nikolaus,

Please try this, we are cleaning up after a few edge cases from 17.7 with 17.7.1:

# opnsense-patch 0b38eff5f
# /usr/local/etc/rc.filter_configure

(rerun again to remove the patch if not working)


Cheers,
Franco
Title: Re: Multi-WAN router-originated traffic
Post by: flyniki on August 17, 2017, 08:33:09 pm
Hi Franco,
  thank you for your fast reply. Unfortunately the patch did not solve the problem. It made the situation slightly worse, because disabling interface OPT2 did change the default route back to OPT1, but router-originated traffic did not work again. I had to disable gateway OPT2 in addition to disabling the interface to get router-originated traffic work again. Do you need additional information from me? Shall i open a new thread?
best regards, Nikolaus
Title: Re: Multi-WAN router-originated traffic
Post by: whitwye on August 18, 2017, 03:28:54 pm
Is it a good bet that these problems with Multi WAN (the ones here, the ones I've reported in https://forum.opnsense.org/index.php?topic=5765.0) are all of a piece?
Title: Re: Multi-WAN router-originated traffic
Post by: whitwye on August 18, 2017, 10:07:23 pm
Just experimented with turning off the Tier1 WAN interface. When I did that the Tier2 IPs, that had been working for NATing to a DNS server on the LAN, as well as for providing admin access to OPNsense, stopped working for either use. Brought back the Tier1 WAN, and the Tier2 WANs IPs started working again. Need to work out settings such that Tier2 IPs stay good no matter what the state of Tier1. As is, it doesn't look like the gateway group feature works for either traffic inward via NAT to the LAN, nor for traffic to the firewall itself.

With the interface on WAN1 up:

Quote
root@OPNsense:~ # route get 207.136.236.70
   route to: vt.[obfuscated].com
destination: default
       mask: default
    gateway: [obfuscated].jfk01.atlas.cogentco.com
        fib: 0
  interface: igb1
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0

With it down:

Quote
root@OPNsense:~ # route get 207.136.236.70
route: route has not been found

There should be a route found on WAN2.

OPNsense is deleting the default route from the routing table when the interface goes down. It's not replacing it with a default route via the second WAN interface. Perhaps that's as it should be. But for the second WAN interface to stop being able to respond to traffic incoming to it, as well as for traffic originating on the firewall to find no path out ... not good. The LAN interface is still good, but that's it.
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on August 21, 2017, 03:00:35 pm
I'll be configuring something in our test lab this week to debug further, and will report back!
Title: Re: Multi-WAN router-originated traffic
Post by: whitwye on August 21, 2017, 03:59:00 pm
Thanks. Tried one more thing: I'd had the WAN2 set to take over outward routing as failover. Reconfigured for it to instead work in load balancing mode. Didn't make a difference. Turning off the first WAN interface results in traffic not being responded to when sent to WAN2 IPs (that had been working with WAN1 on in either case), nor in the Firewall being able to initiate any outgoing traffic.

I can see interesting changes with

Quote
diff rules.debug rules.debug.old

in /tmp, with route-to and reply-to rules changing with the configuration changes and turning WAN1 off and on. So the system's not failing to recognize the changes. It's just not responding with full adequacy.

If anyone has additional configuration steps to suggest which might work around this, I'm up for more experimentation.
Title: Re: Multi-WAN router-originated traffic
Post by: whitwye on August 21, 2017, 04:21:13 pm
Note: Tried the patch here: https://forum.opnsense.org/index.php?topic=5785.0 (and above in this thread). As I noted there, it does not fix what I'm seeing.
Title: Re: Multi-WAN router-originated traffic
Post by: whitwye on August 21, 2017, 04:59:06 pm
Another data point: Disabling WAN2 has no effect on WAN1.

So:

Disable WAN1 and WAN2 no longer can respond to outside traffic coming in, nor originate traffic. (There's nothing yet using this system for LAN devices going outwards, so haven't tested that.)

Disable WAN2 and WAN1 continues working for both outside traffic coming in, and originating traffic.

Checking with "netstat -nr" disabling WAN1 removes the default route via WAN1, and does not replace it with a default route via WAN2. WAN2 does have its IPv4 Upstream Gateway set in the configuration, but that is not substituted in this case.
Title: Re: Multi-WAN router-originated traffic
Post by: mimugmail on September 12, 2017, 11:50:06 am
I'm experienceing the same right now!

https://forum.opnsense.org/index.php?topic=5942.0
https://github.com/opnsense/core/issues/1811
https://github.com/opnsense/core/commit/0b38eff5f#commitcomment-24246290

Multi WAN with local services is ATM a b*tch :)

I'm in IRC within workhours, would like to troubleshoot with you guys together ...
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on September 17, 2017, 11:50:38 pm
Default gateway switching seems to be working OK right now. The local services (DNS, Zerotier, etc.) angle makes this the only real workable option for me ATM.

Franco's earlier notes are making much more sense to me - I know one of the big pains for people moving from Sonicwall / Fortinet / Watchguard boxes to OPNSense / pfSense / etc. is "multi-WAN is hard". Providing a guided UI to simplify multi-WAN would help quite a bit in these scenarios.

Peplink is one vendor that, while their boxes are pretty simple and don't do much, do the multi-WAN UI simplification (and heck, Multi-WAN itself) fairly well. Franco, I'd be happy to provide someone from OPNSense access if you'd like to peek and don't have any around.
Title: Re: Multi-WAN router-originated traffic
Post by: franco on September 18, 2017, 10:52:23 am
Hi Michael,

That would be great to have a look at if possible. :)

I would say the whole thing was "organically grown" adding features here and there through several pfSense / FreeBSD versions in the past. Everything makes good sense in the context of single iterations, but in the grand scheme of things where we are looking at it now there is potential for a revamp of how it is presented / configured.


Cheers,
Franco
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on September 19, 2017, 11:41:16 pm
Ah, turns out they have an online demo. Use admin/admin at:

https://balancedemo.peplink.com/cgi-bin/MANGA/index.cgi

It's not the prettiest UI, but the multi-WAN stuff works really well. Take a look at outbound policies in particular.

Generally, I think if you could tag a gateway group as "default gateway" for a box, that covered both internal and client traffic, that would be a great user experience - given the power gateway groups already have with parameters for "health", load balancing vs. failover, etc.
Title: Re: Multi-WAN router-originated traffic
Post by: obrienmd on September 26, 2017, 02:26:31 am
@franco - Perhaps worth moving into a request for feedback GitHub issue on "making multiWAN awesome"?