Policy-based Wireguard(/Mullvad): firewall rules ignored when gateway is down

Started by ownerer, February 04, 2020, 11:50:37 AM

Previous topic - Next topic
So I just discovered this gem.
A little context first:

  • I have Wireguard/Mullvad set up with a gateway, as described here
  • I have an alias defined for all hosts that must only access the internet through a VPN connection
  • I have a VPN gateway group set up for load balancing/failover
  • The firewall is configured to send all VPN destined traffic through the VPN gateway group
  • The firewall is also configured to reject all VPN destined traffic that somehow makes it past that routing rule
  • This screenshot shows the firewall config, these are the last 3 rules in the config for the LAN interface

This works perfectly when PIA (OpenVPN) is used in the VPN gateway group.
As you'd expect, when the client is disabled, the last but one rule is triggered and prevents VPN destined traffic from going out the WAN gateway group.
OpenVPN gateway is down:

Traffic is blocked:


Here comes the kicker though: replace the PIA gateway in the gateway group with the Mullvad gateway, and traffic merrily flows out from the WAN interface when the Wireguard gateway is down/disabled.
Even better: it's actually the VPN pass rule that still directs traffic to the gateway group, even though its members are all down!
Wireguard gateway is down:

Traffic is passed to the "VPN gateway"?!

To be crystal clear: when I access wtfismyip.com and similar in that situation, it shows my ISP IP, NOT any VPN's.

WHAT?!
What am I missing here, if anything  :-\ ?

PS:
The rule that routes traffic to the VPN gateway group also sets a NO_WAN_EGRESS tag.
I have a floating rule defined on my WAN interface to reject all traffic with that tag.
This used to work, but doesn't seem to anymore?
This is why I resorted to the explicit reject rule in the first place.

PPS:
I also came across this gem in trying to fix this issue. That could most definitely use some clarifying as well...
I fiddled around with that setting, but it didn't make any difference in my case, and still wouldn't explain the difference in behavior between an OpenVPN and Wireguard gateway.

Can you verify if the traffic really leaves WAN? The log with accepted packets just indicates it tries to push these packets via the correct rule, no matter if gw group is down or not.

Second test would be to add a rules to Wireguard gateway explicit, not the whole group, and try to reproduce.

First of all: I have since updated to 20.1, it hasn't made any difference.

Quote from: mimugmail on February 04, 2020, 01:47:11 PM
Can you verify if the traffic really leaves WAN? The log with accepted packets just indicates it tries to push these packets via the correct rule, no matter if gw group is down or not.
I don't see how there is any other way than for this to be the case?
Like I said: if I visit wtfismyip.com and similar sites in that state, they all show my ISP data.

Quote from: mimugmail on February 04, 2020, 01:47:11 PM
Second test would be to add a rules to Wireguard gateway explicit, not the whole group, and try to reproduce.
Unfortunately this makes no difference, the exact same behaviour remains.

What is different about a Wireguard gateway from an OpenVPN gateway?
I mean, shouldn't the whole gateway logic be protocol agnostic?
If a gateway monitor says the gateway is down, then that's all that should matter -> don't pass traffic, evaluate next rule, done.
Yet there is a clear difference between how the 2 protocols' gateways behave.

Assuming I'm not missing anything, I would argue this is a bug and/or design flaw.

No, WireGuard handles Gateway code different, there May be situations where it doesnt work as a general limitation. Thats why I asked to test without a group, just the gateway

If the gateways behave that differently, then that should at least be made very clear to the end-user.

Again, to summarize: the bug here is that, even though the gateway is marked down, used in a group or solo, the firewall still matches the routing rule for it, pushing traffic to it (so subsequent killswitch rules don't even have a chance), on top of which traffic somehow goes out over WAN instead of the Wireguard interface, unbeknownst to the user.

I get the implementation details might be different, but functionally it should exhibit the same behaviour. So classifying this as a "limitation" is being very mild. This is a bug/design flaw.
I mean let's be real, why do people use VPNs like this? To route traffic through they'd rather not have their ISP see. If you can't even trust the firewall to kill traffic when the gateway goes down, in any way, then what's the point?

Long story short: I'd argue Wireguard is currently unsuited to replace OpenVPN in this setup.
If you think I'm wrong in drawing that conclusion, then please tell me why, because I want to be (wrong, that is)!

I don't think you are wrong, why do you think there's a big banner in WireGuard -> General that this software is still beta?

EDIT:

I've created a new topic to follow up on this.

TLDR: Wireguard just does not run well enough yet on OPNSense.
Between the issue I've described here and the kernel panics it seems to introduce as described here, I've decided to look for another solution:
I still want to use Wireguard, but I don't want it to mess with OPNSense functionality and break all my networking whenever it decides to act up.
-> abstract Wireguard stuff from OPNSense.