OPNsense Forum

Archive => 19.7 Legacy Series => Topic started by: Devnull on October 03, 2019, 06:12:38 am

Title: Wireguard Issues
Post by: Devnull on October 03, 2019, 06:12:38 am
Hey there,

I am having some pretty annoying issues regarding the usage of Wireguard and trying to route two IPv6 subnets over the tunnel. After setting everything up, OPNsense proceedes to kernel panic as soon as there's traffic on any interface with an assigned IPv6 subnet that is not the ISP one (or even at seemingly random occasions). Those panics only stop when wiping each and every bit of IPv6 configuration made that is related to Wireguard.

But first lets start with the backstory and what I am trying to accomplish here:
I am customer of the largest German ISP, which has a pretty terrible peering policy and regularly over-saturates their peering to the big Tier 1 carriers in order to sell private peerings to the services affected. Since this is the only ISP available in my building, I am unable to just switch ISPs and move on.
Therefore I am currently using a VPS to route my whole connection over a Wireguard tunnel on said VPS - with a few exceptions that require me to use my home connection due to e.g. regional blocking.
For that I have an extra dedicated IPv4 address that is NAT-ed to the tunnel address of my home router, which is establishing the tunnel and handling all the traffic. Additionally I have a /48 IPv6 subnet to use at my descretion.
We'll focus on the IPv6 part here, because the IPv4 part is working as intended without any issues whatsoever.

Please note, this setup works perfectly fine on OpenWRT (which I currently use) and works* on pfSense with the same configuration as tried in OPNsense.

(* Since pfSense refuses to implement Wireguard, I had to install Wireguard manually from the FreeBSD repository. I am unable to replicate the "Don't add routes" option from OPNsense here. This results in some different issues regarding the routing, which makes this solution undesireable for my usecase. There is absolutely no crashing on pfSense though.)

Hardwarewise I am using a Supermicro X9SCM-F with an Intel Xeon E3-1230 with 8 gigs of RAM. OPNsense is running inside a virtual machine on a Proxmox 6.0-4 host with virtio network cards. To rule out any virtio shenanigans, I also tried passing the onboard NICs to the OPNsense VM, which did not result in any different behavior.
Also tried to install OPNsense bare metal on the server to rule out any possible virtualization issues. The crashes are there regardless of what I tried.

As for OPNsense itself, initially tried it with the most stable release (which was 19.7 at the time of writing this post) and the proceeded to try it with a development version (20.1.a_138 at the time of writing).
I used the following versions of wireguard:
- wireguard-0.0.20190905
- wireguard-go-0.0.20190805
- os-wireguard v1.1
I also tried wireguard-0.0.20190913 and wireguard-go-0.0.20190908 directly from the FreeBSD repos.

My IPv6 subnets look as follows:
- A dedicated IPv6 subnet for the VPS itself, lets call it 2001:DB8:6666::/56
- A /56 IPv6 subnet for the Wireguard tunnel itself, which is further split into a single used /64: 2001:DB8:6666:100::/64 (Unsplit: 2001:DB8:6666:100::/56)
- A /56 IPv6 subnet to use behind the tunnel for my devices at home: 2001:DB8:6666:1000::/56
- There are currently two vlans at home that use the subnetworks 2001:DB8:6666:1000::/64 and 2001:DB8:6666:1001::/64 out of that /56

My Wireguard configuration looks like this:
Server-side:
Code: [Select]
[Interface]
Address = 10.0.0.1/24
Address = 2001:DB8:6666:100::1/64
PrivateKey = *snip*
ListenPort = 56666
PostUp = *adding the IPv4 NAT stuff via IPTables*
PostDown = *removing the IPv4 NAT stuff*

[Peer]
PublicKey = *snip*
AllowedIPs = 10.0.0.0/24, 2001:DB8:6666:100::/64, 2001:DB8:6666:1000::/56

The client side will get a little long, so bare with me (Only including relevant parts, omitting things like private/public keys or names):
VPN -> Wireguard -> Endpoint
- Enabled: Check
- Allowed IPs: 0.0.0.0/0, ::0/0
- Endpoint Address: VPS main public IPv4
- Endpoint Port: 56666
- Keepalive 25
VPN -> Wireguard -> Local
- Name: wg0
- Tunnel Address: 10.0.0.2/32, 2001:DB8:6666:100::2/64 (I did try /128 there, but that breaks OPNsense Gateway assignments)
- Peers: Endpoint created above
- Disable Routes: Check
- Gateway: 10.0.0.1

Interfaces -> Assignments -> Assign wg0 to new interface (wgrd)
- IPv4 Configuration Type: None
- IPv6 Configuration Type: None
- MTU: 1412 (That's what Wireguard defaults to, also tried 1404 to account for PPPoE overhead)
- MSS: 1412

System -> Gateways -> Single
- Add
-- Interface: wgrd
-- Address Family: IPv4
-- IP Address: 10.0.0.1
-- Far Gateway: Check
-- Disable Gateway Monitoring: Check
-- Priority: 255

- Add
-- Interface: wgrd
-- Address Family: IPv6
-- IP Address: 2001:DB8:6666:100::1 (Wireguard far tunnel address)
-- Far Gateway: Check (Although that does nothing according to a forum post here)
-- Disable Gateway Monitoring: Check
-- Priority: 255

Interfaces -> LAN (on VLAN 100)
- IPv6 Configuration Type: Static IPv6
- IPv6 Address: 2001:DB8:6666:1000::1/64
- IPv6 Upstream Gateway: Auto-detect (can't add the gateway here, as it's outside of the subnet of LAN)
 Interfaces -> WIFI (on VLAN 101)
- IPv6 Configuration Type: Static IPv6
- IPv6 Address: 2001:DB8:6666:1001::1/64
- IPv6 Upstream Gateway: Auto-detect (can't add the gateway here, as it's outside of the subnet of WIFI)

I also have firewall rules added for IPv4/IPv6 to route the desired traffic over the tunnel via the gateways created above.
In my opinion I shouldn't have to add these entries for IPv6, as my clients on LAN and WIFI do not get an IPv6 address from my ISP connection and therefore only have the Wireguard Gateway as applicable route. This however doesn't seem to work reliably. Either no traffic comes through at all (and is tried to be routed over the ISP IPv6 Gateway), or is working sometimes but without a real pattern that I can see.

As soon as these subnets are added to the interfaces and distributed to the clients via radvd (assisted), the crashing can either begin after a short period of time, or it does work until a reboot is initiated by me. After that it crashes over and over and over again, even shortly after booting up with no chance to even access the webinterface.
The crashing also occurs if I don't use "Disable Routes" in VPN -> Wireguard -> Local, which is then a replica of the configuration I successfully tested on pfSense.

I also noticed that shortly before the crash occurs, the CPU usage skyrockets to 80-100%.
I have no crashdumps at hand and can't produce any new as I'd have to rewire/reconfigure my whole network anew to replicate (I am not using that machine due to the crashing problems for the time being).
I did submit a couple of crashdumps via the reporting tool of OPNsense though.
I do remember however it being a "Fatal trap 12: page fault while in kernel mode" panic.

To debug the issue further, I tried separating the IPv4 and IPv6 parts of the wireguard tunnel into separate peers. That's when it became clear that the IPv6 part of the configuration seems to be the culprit, as eveything does work fine until you add the IPv6 peer and the associated subnets to the interfaces.
There are other issues with that approach though, as Wireguard doesn't let you add an IPv6 gateway in VPN -> Wireguard -> Local when using "Disable Routes". Then Wireguard itself isn't starting, because the route to the gateway can't be added. Wireguard is trying to add "route add Gateway -iface wg0" which is refused by FreeBSD with error "bad address".
Aaand it still panics when you try it with separate peers.

I hope I haven't forgotten to add anything, it is rather much info collected.
If there's any more info I can provide, or someone has ideas as to how to resolve this issue, don't hesitate to tell.
As for the crashdumps, I hope those were uploaded correctly by the reporting tool and can be accessed by franco :)


~Devnull
Title: Re: Wireguard Issues
Post by: Devnull on October 14, 2019, 12:13:07 pm
So I have been able to test a little bit more on this issue. My first assumption, that it is related to IPv6 seems to be incorrect.
Even without touching anything IPv6 related in wireguard, OPNsense keeps kernel panicing. So it's rather an issue with Wireguard itself and possibly related to PBR. I have altered the title of the thread to reflect that.

This time I was able to snatch some of the panics:
https://pastebin.com/u6F6yT7X
https://pastebin.com/Fccw2Nzk
https://pastebin.com/1bQ5yCxx
https://pastebin.com/76bxJrQD

(Pastes will expire in 6 months, as I hope they will become irrelevant by that time)

Majority of the crashes are Fatal trap 12: page fault while in kernel mode, the trap 9 (#2) and #4 without a trap one I only saw once while trying to capture the panics.

Odd is, that when restoring my VM to a state where Wireguard is not installed, install Wireguard into it and configure it to my needs and keep it running without rebooting once, I was able to achieve an uptime of more than 10 hours without crashing.
I then rebooted the VM and the crashing started immediately when my PPPoE session to my ISP came up.
Even when quickly disabling wireguard, the PBR rules I've added and disabling the wireguard gateway, it keeps crashing as soon as PPPoE comes up.

What I've tried so far:
- Trying to utilize PBR with Wireguard, but without the IPv6 part
- A fresh, unupdated 19.7 OPNsense install behaves identically as an updated 19.7.5 and 20.1_192 install
- Tried differend NICs with different chipsets
- Tried PCI(e) passthrough of the NICs to the VM
- Running OPNsense on bare metal
 
Title: Re: Wireguard Issues
Post by: mimugmail on October 14, 2019, 04:44:51 pm
Do you also have a backtrace from bare metal? This all sounds like virtio driver problem ...
Title: Re: Wireguard Issues
Post by: Devnull on October 14, 2019, 07:41:20 pm
Quote
Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80cb9bd4
stack pointer           = 0x0:0xfffffe00002612f0
frame pointer           = 0x0:0xfffffe0000261320
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq271: em3:rx0)

This is all I can get, console output stops afterwards.
em3 is my LAN interface.
Title: Re: Wireguard Issues
Post by: sashxp on October 14, 2019, 09:08:45 pm
Hi Devnull,

i've had a similar issue and posted it at Github, but wasn't able to post it at freebsd. I've put all my Information and my Trace here: https://github.com/opnsense/core/issues/3696 but so far no one was able to fix the issue.

Quote
I also noticed that shortly before the crash occurs, the CPU usage skyrockets to 80-100%.
I have no crashdumps at hand and can't produce any new as I'd have to rewire/reconfigure my whole network anew to replicate (I am not using that machine due to the crashing problems for the time being).
I did submit a couple of crashdumps via the reporting tool of OPNsense though.
I do remember however it being a "Fatal trap 12: page fault while in kernel mode" panic.

To debug the issue further, I tried separating the IPv4 and IPv6 parts of the wireguard tunnel into separate peers. That's when it became clear that the IPv6 part of the configuration seems to be the culprit, as eveything does work fine until you add the IPv6 peer and the associated subnets to the interfaces.
There are other issues with that approach though, as Wireguard doesn't let you add an IPv6 gateway in VPN -> Wireguard -> Local when using "Disable Routes". Then Wireguard itself isn't starting, because the route to the gateway can't be added. Wireguard is trying to add "route add Gateway -iface wg0" which is refused by FreeBSD with error "bad address".
Aaand it still panics when you try it with separate peers.

I hope I haven't forgotten to add anything, it is rather much info collected.
If there's any more info I can provide, or someone has ideas as to how to resolve this issue, don't hesitate to tell.
As for the crashdumps, I hope those were uploaded correctly by the reporting tool and can be accessed by franco

I can totally agree your observations and had exactly the same issue! I've also send my Crashdump via OPNSense.

@mimugmail you have seen this crash live via Teamviewer on my box.

I hope we get this bug fixed soon :-)

Title: Re: Wireguard Issues
Post by: Devnull on February 11, 2020, 02:28:29 pm
As AdSchellevis on Github suggested (click) (https://github.com/opnsense/src/issues/46), this seems to be indeed an issue with shared forwarding. When disabling shared forwarding, the crashing seems to not occur anymore.
The issue is described here (https://github.com/opnsense/src/issues/52) in detail.

Running stable so far for the last 48 hours.
Also, this issue of course still occurs on 20.1. so this thread might as well be moved over to 20.1 Production Series.

For anyone stumbling upon this thread looking for a workaround (at least for now, until fixed), you'll have to disable "Use shared forwarding between packet filter, traffic shaper and captive portal" in Firewall => Settings => Advanced => Multi-WAN => Shared forwarding when using policy based routing with wireguard.
This is just a bandaid though, as this renders traffic shaping (and captive portal restrictions) useless for such setups.
Title: Re: Wireguard Issues
Post by: ownerer on February 13, 2020, 07:22:44 pm
Damn, came here from my own thread (https://forum.opnsense.org/index.php?topic=15732.0) of Wireguard policy-based routing problems.
And here I thought I was having hardware issues!
"Good" to see I'm not alone getting these kernel panics.
It seems Wireguard and/or its implementation in OPNSense isn't quite stable enough yet after all :(