Weird Wireguard issue

Started by jahlives, July 28, 2025, 12:12:13 PM

Previous topic - Next topic
Hello forum

we're currently facing a weird Wireguard problem which makes 0 sense to me. We have

server side 10.20.60.0/24 server side wg IP 10.230.0.1 and server side opnsense local IP 10.20.60.2
peer: jv41jYNdMIt+OGZcgBFBjNJxYeZasPHOTm6axu1lWzw=
  endpoint: redacted:60554
  allowed ips: 10.3.0.0/16, 10.230.0.254/32
  latest handshake: 53 seconds ago
  transfer: 44.95 GiB received, 20.35 GiB sent
  persistent keepalive: every 30 seconds

client side 10.3.0.0/16 client side wg IP 10.230.0.254 and client side opnsense local IP 10.3.0.5
peer: 56tUzFXZ1QrweRqKCyJyG1OPeKYv0Fr9Ke/sBA/viR0=
  endpoint: redacted:51820
  allowed ips: 10.20.50.0/24, 10.11.0.0/16, 10.20.60.0/24, 10.231.0.0/16, 10.230.0.1/32
  latest handshake: 1 minute, 59 seconds ago
  transfer: 19.33 GiB received, 45.48 GiB sent
  persistent keepalive: every 30 seconds

peer config on both sides allow the remote local net traffic

When I try to ping from 10.20.60.8 to 10.3.0.81 I can see on the firewall that the traffic comes in on LAN interface but is not shown on wg1 interface. Also not shown outgoing on ANY other interface. Route on the firewall (server side) looks okay
route -n get 10.3.0.81
   route to: 10.3.0.81
destination: 10.3.0.0
       mask: 255.255.0.0
        fib: 0
  interface: wg1
      flags: <UP,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1420         1         0
Also the firewall itself can ping the 10.3.0.0/16 subnet. There are NO firewall rules that would block the traffic. The firewall itself can reach the remote network without issues. Also the remote network (client side) can reach the local network (server side). Also when I ping from 10.20.60.8 to the Wireguard IP of the remote firewall (10.230.0.254) I get replies. But not if I ping the local IP of the remote firewall (10.3.0.5). We do not use outgoing NAT rules as the remote systems have routes for the local net via their local opnsense. But for testing I tried with an explicit outgoing NAT rule but no success either.

Are we hitting a bug? I tried to debug the issue for quite a while now, but we absolutely no success. The fact that I cannot see traffic leaving on server side wg1 interface indicates to me that the packets are either misrouted (although I cannot see the outgoing traffic on any interface) or dropped (although even with pfctl -d it does not work)

Any ideas?

Kind regards

tobi

Also this firewall has WG connections to many other endpoints and there is no problem in reaching those remote networks. It's just this 10.3.0.0/16 destination network that causes trouble

I let AI summarize my testings from yesterday :-)

OPNsense WireGuard Forwarding Issue Summary

Problem Statement
We have an OPNsense firewall (116.203.251.18) with a WireGuard VPN tunnel (wg1) to a remote endpoint (217.20.196.67). The firewall itself can reach the remote network (10.3.0.0/16) via the WireGuard tunnel, but clients on the LAN (10.20.60.0/24) cannot reach this specific remote network. Interestingly, LAN clients can reach other networks behind other WireGuard peers without issues, and they can even ping the WireGuard IP of the problematic remote peer (10.230.0.254), but not any IPs in its 10.3.0.0/16 network.

Network Configuration
- OPNsense Firewall: 116.203.251.18
  - WireGuard interface (wg1): 10.230.0.1/16
  - LAN interface (vtnet1): 10.20.60.2/24
- Remote WireGuard peer: 217.20.196.67
  - WireGuard IP: 10.230.0.254
  - Local network: 10.3.0.0/16
- Remote peer AllowedIPs configuration includes 10.20.60.0/24
- Routing on OPNsense shows 10.3.0.0/16 correctly routed to wg1 interface

Diagnostic Steps and Results

1. **Packet Capture Tests**:
   - Traffic from LAN clients to 10.3.0.0/16 reaches the LAN interface (vtnet1)
   - No traffic appears on the WireGuard interface (wg1)
   - No traffic appears in pflog0 (firewall logs)

2. **Routing Verification**:
   - `netstat -rn | grep 10.3.0`: Confirms route exists (10.3.0.0/16 via wg1)
   - `route add 10.3.0.5 -interface wg1`: Added specific host route, still no traffic

3. **Firewall Rule Tests**:
   - Created floating rule with "sloppy state" tracking
   - Verified no blocking rules exist for this traffic
   - `pfctl -s all | grep "block drop"`: No relevant blocking rules

4. **NAT Configuration**:
   - Added specific outbound NAT rule for 10.20.60.0/24 to 10.3.0.0/16 via WireGuard
   - Enabled "static port" option

5. **WireGuard Configuration Check**:
   - Confirmed remote peer's AllowedIPs includes 10.20.60.0/24
   - OPNsense can ping remote network, confirming tunnel works
   - LAN clients can ping 10.230.0.254 (WireGuard IP) but not 10.3.0.0/16 networks

6. **Advanced Tests**:
   - Checked for asymmetric routing issues
   - Verified state tracking settings
   - Examined MTU settings (wg1: 1420)
   - Checked for system tunables conflicts

Key Findings

1. **Critical Detail**: LAN clients can reach the WireGuard peer IP (10.230.0.254) but not the networks behind it (10.3.0.0/16)

2. **Mystery**: No packets from LAN to 10.3.0.0/16 ever appear on wg1 interface, despite:
   - Correct routing table entries
   - No visible blocking firewall rules
   - Functioning WireGuard tunnel (confirmed by firewall's ability to reach 10.3.0.0/16)
   - Specific host routes having no effect

3. **Most Likely Cause**: Bug in OPNsense's WireGuard implementation regarding how forwarded traffic to specific networks is handled.

What could be preventing forwarded traffic from LAN clients from reaching the WireGuard interface, when direct traffic from the firewall works fine, and when other WireGuard networks are accessible?

Quote from: jahlives on July 29, 2025, 09:26:31 AM**Most Likely Cause**: Bug in OPNsense's WireGuard implementation regarding how forwarded traffic to specific networks is handled.
I don't think so and am suggesting to look elsewhere.

QuoteI don't think so and am suggesting to look elsewhere.
that was the AI's conclusion not mine ;-)

But actually I have no idea what else to check/test. It's not my first Wireguard setup and until now I never had issues like that. Usually it was a missing allowed_ip that interfered with routing. But in this case everything looks fine to me, setup/configuration-wise I mean. The fact that I never see the packets on wg1 interface first lead me to the assumption that it's a routing issue but as I cannot see the packets leaving on ANY interface I think it's not routing related. Then it could be the firewall/rules but as even with pfctl -d no success I think the firewall and rules are off the table. The fact that the firewall itself can reach remote also speaks against a routing issue. So the only thing I can think of is the different handling between outgoing traffic (when firewall itself pings remote) compared to forward traffic (when a client behind the firewall pings remote). And as said: forward traffic is not an issue for other remote destinations via the same WG instance, it's just this particular remote subnet.

To my knowledge (which may is a bit limited for FreeBSD ;-)) the processing of traffic is kernel stuff, no? If it works for outbound packets but not for forward packets where else could I look for? I would really check everything that is suggested here.

Just saw that I missed to mention the OpnSense version here: we use OPNsense 25.1.10-amd64 on both sides of the tunnel

Cheers

tobi

Mistery solved thanks to Cedrik from OPNsense support :-)

We still had an active IPSec configuration from the very beginning. As this never worked with the remote we changed to Wireguard but forgot the IPSec. As IPSec phase 2 never was established there was no route or interface visible but seems the kernel already "stole" the packets based on phase 1 and then just dropped them. The drop has not been shown in any logfile/packet capture.

So I case of such "weird" issues: ensure you check IPSec settings in GUI as the console does not show routes or interfaces for IPSec if phase 2 is not established. The only trace of this on cli was the output of
swanctl --list-sas
which finally led us to the right trace