WireGuard "Handshake did not complete after 5 seconds": Multi-WAN outgoing issue

Started by CDuv, October 03, 2022, 06:27:45 PM

Previous topic - Next topic
Hello,

I am failing to setup a WireGuard VPN tunnel on my OPNsense (v22.7.4 with "os-wireguard" plugin v1.12) and I begin to think multi-WAN configuration (load balancing, outbound rules, gateway groups, ...) is causing issues.

I have, using the guide, already setup a "Road Warrior" WireGuard VPN server on another OPNsense box, but there was only a single WAN.

I've read (on this forum?) that WireGuard is not really (yet?) capable of multi-WAN, and I get it: But could I at least, get it to work over one of the WAN I have (without load balacing nor automatic failover)?

I am using the iOS client and logs shows:
QuoteHandshake did not complete after 5 seconds, retrying (try 3)

In the "VPN: WireGuard: List Configuration", the peer part does display "endpoint", "allowed ips" and some "transfer" values, but no "latest handshake" (which the "Handshakes" tab confirms: the timestamps for the peer is at "0").

Here is what I want to have in the end:


  • Allow an user to connect to my LAN network from outside Internet
  • It does not need to access Internet via the VPN (he'll use it's ISP for that)
  • Restrict it to a fixed local IP address
  • Apply some restrictions to what local IP addresses it can access (eg. only allow access to a printer having IP 10.0.0.78/24 (I'll try to set this up once I get the previous 3 points up and running)

Here is my setup:

(My LAN is on 10.0.0.0/24)

VPN: WireGuard:
* Local:
  * Enabled: ☑
  * Name: WG_VPN
  * Public Key: Mo_d...Vb1p
  * Listen Port: 51820
  * Tunnel Port: 10.3.0.1/26
  * Peers:
    * user1
* Endpoint:
  * Enabled: ☑
  * Name: user1
  * Public Key: Fx+p...Zw3d
  * Shared Secret: (empty)
  * Allowed IPs: 10.3.0.2/32
  * Endpoint Address: (empty)
  * Endpoint Port: (empty)
  * Keepalive: (empty)
* List Configuration:
   
    interface: wg1
      public key: Mo_d...Vb1p
      private key: (hidden)
      listening port: 51820

    peer: Fx+p...Zw3d
      allowed ips: 10.3.0.2/32


Interfaces: [WG_VPN]:
* Enabled: ☑
* Lock: ☑
* Device: wg1
* IPv4 Configuration Type: None
* IPv6 Configuration Type: None

Firewall: NAT: Outbound:
* Mode: Hybrid outbound NAT rule generation
* Manual rules:




InterfaceSourceNAT AddressNAT Port
WAN_110.0.0.0/24Interface address*
WAN_210.0.0.0/24Interface address*

Firewall: Rules: LAN:




SourcePortDestinationPortGatewayDescription
10.0.0.0/24*10.3.0.0/26**Allow LAN clients to contact WG_VPN clients
10.0.0.0/24*10.3.0.0/26*GW_LoadBalancingSet the gateway for LAN clients

Firewall: Rules: WG_VPN:



SourcePortDestinationPortGatewayDescription
*****Allow all WG_VPN clients to contact LAN clients (for now)

I tried deleting all the WireGuard setup and starting fresh without much success.

Full logs from the iOS client:
Quote[APP] startActivation: Entering (tunnel: MyCompany)
2022-10-03 17:22:06.517
[APP] startActivation: Starting tunnel
2022-10-03 17:22:06.518
[APP] startActivation: Success
2022-10-03 17:22:06.542
[APP] Tunnel 'MyCompany' connection status changed to 'connecting'
2022-10-03 17:22:06.672
[NET] App version: 1.0.15 (26)
2022-10-03 17:22:06.673
[NET] Starting tunnel from the app
2022-10-03 17:22:06.756
[NET] DNS64: mapped 1.2.3.4 to itself.
2022-10-03 17:22:06.756
[NET] Attaching to interface
2022-10-03 17:22:06.757
[NET] UAPI: Updating private key
2022-10-03 17:22:06.757
[NET] Routine: decryption worker 1 - started
2022-10-03 17:22:06.757
[NET] Routine: event worker - started
2022-10-03 17:22:06.757
[NET] Routine: encryption worker 1 - started
2022-10-03 17:22:06.757
[NET] Routine: handshake worker 1 - started
2022-10-03 17:22:06.757
[NET] Routine: decryption worker 2 - started
2022-10-03 17:22:06.757
[NET] Routine: handshake worker 2 - started
2022-10-03 17:22:06.758
[NET] Routine: encryption worker 2 - started
2022-10-03 17:22:06.758
[NET] Routine: TUN reader - started
2022-10-03 17:22:06.759
[NET] UAPI: Removing all peers
2022-10-03 17:22:06.761
[NET] peer(Mo_d...Vb1p) - UAPI: Created
2022-10-03 17:22:06.763
[NET] peer(Mo_d...Vb1p) - UAPI: Updating endpoint
2022-10-03 17:22:06.765
[NET] peer(Mo_d...Vb1p) - UAPI: Updating persistent keepalive interval
2022-10-03 17:22:06.767
[NET] peer(Mo_d...Vb1p) - UAPI: Removing all allowedips
2022-10-03 17:22:06.769
[NET] peer(Mo_d...Vb1p) - UAPI: Adding allowedip
2022-10-03 17:22:06.771
[NET] UDP bind has been updated
2022-10-03 17:22:06.771
[NET] Routine: receive incoming v4 - started
2022-10-03 17:22:06.773
[NET] peer(Mo_d...Vb1p) - Starting
2022-10-03 17:22:06.773
[NET] Routine: receive incoming v6 - started
2022-10-03 17:22:06.775
[NET] peer(Mo_d...Vb1p) - Sending keepalive packet
2022-10-03 17:22:06.775
[NET] peer(Mo_d...Vb1p) - Routine: sequential sender - started
2022-10-03 17:22:06.775
[NET] peer(Mo_d...Vb1p) - Routine: sequential receiver - started
2022-10-03 17:22:06.777
[NET] peer(Mo_d...Vb1p) - Sending handshake initiation
2022-10-03 17:22:06.780
[NET] Interface state was Down, requested Up, now Up
2022-10-03 17:22:06.781
[NET] Device started
2022-10-03 17:22:06.783
[NET] Tunnel interface is utun2
2022-10-03 17:22:06.786
[APP] Tunnel 'MyCompany' connection status changed to 'connected'
2022-10-03 17:22:06.787
[NET] Network change detected with satisfied route and interface order [pdp_ip0]
2022-10-03 17:22:06.788
[NET] DNS64: mapped 1.2.3.4 to itself.
2022-10-03 17:22:06.790
[NET] peer(Mo_d...Vb1p) - UAPI: Updating endpoint
2022-10-03 17:22:06.792
[NET] Routine: receive incoming v4 - stopped
2022-10-03 17:22:06.792
[NET] Routine: receive incoming v6 - stopped
2022-10-03 17:22:06.796
[NET] UDP bind has been updated
2022-10-03 17:22:06.796
[NET] Routine: receive incoming v6 - started
2022-10-03 17:22:06.796
[NET] Routine: receive incoming v4 - started
2022-10-03 17:22:07.044
[NET] Network change detected with satisfied route and interface order [pdp_ip0, utun2]
2022-10-03 17:22:07.045
[NET] DNS64: mapped 1.2.3.4 to itself.
2022-10-03 17:22:07.072
[NET] peer(Mo_d...Vb1p) - UAPI: Updating endpoint
2022-10-03 17:22:07.074
[NET] Routine: receive incoming v4 - stopped
2022-10-03 17:22:07.074
[NET] Routine: receive incoming v6 - stopped
2022-10-03 17:22:07.076
[NET] UDP bind has been updated
2022-10-03 17:22:07.078
[NET] Routine: receive incoming v6 - started
2022-10-03 17:22:07.078
[NET] Routine: receive incoming v4 - started
2022-10-03 17:22:11.509
[APP] Status update notification timeout for tunnel 'MyCompany'. Tunnel status is now 'connected'.
2022-10-03 17:22:12.044
[NET] peer(Mo_d...Vb1p) - Handshake did not complete after 5 seconds, retrying (try 2)
2022-10-03 17:22:12.045
[NET] peer(Mo_d...Vb1p) - Sending handshake initiation
2022-10-03 17:22:17.210
[NET] peer(Mo_d...Vb1p) - Handshake did not complete after 5 seconds, retrying (try 2)
2022-10-03 17:22:17.210
[NET] peer(Mo_d...Vb1p) - Sending handshake initiation
2022-10-03 17:22:22.497
[NET] peer(Mo_d...Vb1p) - Handshake did not complete after 5 seconds, retrying (try 2)
2022-10-03 17:22:22.497
[NET] peer(Mo_d...Vb1p) - Sending handshake initiation
2022-10-03 17:22:27.681
[NET] peer(Mo_d...Vb1p) - Handshake did not complete after 5 seconds, retrying (try 3)
2022-10-03 17:22:27.682
[NET] peer(Mo_d...Vb1p) - Sending handshake initiation
2022-10-03 17:22:32.936
[NET] peer(Mo_d...Vb1p) - Handshake did not complete after 5 seconds, retrying (try 4)

Did you go to firewall settings and ensure pass is set for the WireGuard service...set for drop/block by default

Where is that setting? I don't see anything in "Firewall: Settings: Advanced"

I do have an allow rule in "Firewall: Rules: WGVPN".

You need a rule on WAN explicitly allowing UDP/51820 in.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

My bad, I forgot to paste it, here it is:

Firewall: Rules: WAN:




SourcePortDestinationPortGatewayDescription
**WAN address51820*Allow incoming WAN traffic on WG_VPN port
10.0.0.0/24*10.3.0.0/26*GW_LoadBalancingSet the gateway for LAN clients

Instead of simple "Allow incoming UDP/51820" rules on both WAN, I tried using NAT Port Forward rules (on both WAN) to redirect incoming UDP/51820 to 127.0.0.1:51820.

And looking at the live view of firewall logs (Firewall: Log Files: Live View) I can see:






ActionInterfaceDirectionTimeSourceDestinationProtoLabel
WAN_2...T18:25:33WAN_2.PUBLIC.IP.ADDRESS:51820CLI.ENT.IP.ADDRESS:55643udplet out anything from firewall host itself (force gw)
WAN_1...T18:25:33CLI.ENT.IP.ADDRESS:55643127.0.0.1:51820udpNAT to redirect UDP/51820 to 127.0.0.1:51820 (for WG_VPN)
WAN_1...T18:25:33CLI.ENT.IP.ADDRESS:55643WAN_1.PUBLIC.IP.ADDRESS:51820udprdr rule

It looks like the response is going via the other WAN (not the one the request came from).

So I tried adding a firewall rule on loopback interface to force a given interface:




DirectionProtocolSourcePortDestinationPortGatewayDescription
outUDP*51820**WAN_1Set the gateway for WireGuard traffic (test)

But the outgoing WireGuard traffic stills uses WAN_2.

If I make the client connect to WireGuard tunnel using IP from WAN_2 as termination/endpoint it works (handshake is OK on server side and the client can access LAN's IP).

Currently on the "System: Gateways: Single" page I have :





NameInterfaceProtocolPriority
GW_1WAN_1IPv4255 (upstream)
GW_2 (active)WAN_2IPv4255 (upstream)

And my "GW_LoadBalancing" gateway group is a simple group with both GW_1 and GW_2 as "Tier 1".

I guess the outgoing traffic is using GW_2 because of the "(active)" mention? which is lacking on WAN_1?

I guess I am falling into the same issue as described in this topic with:

QuoteThe only way is:

- All users use WAN1 as default
- Only if WAN1 fails they have to use WAN2
- When WAN1 is back, all users get kicked and should switch to WAN1

It wont work in a different way ...

I can live with all WireGuard users using WAN1 and then WAN2 if it fails but I does need the WAN1+WAN2 load balancing for other (non-WireGuard) traffic...


I did the same thing.. followed the guide to a T.. also have ma failover wan group.  Same issue.  I don't even see the traffic hitting my firewall anywhere in the logs tho. 

Ok, I had the exact same issue while everything looking OK and following the guides.
Mine was slightly different in that the first peer could connect but the 2nd one (Ipad) could not perform the handshake

I re-did the whole thing a few times but no dice.

I then thought if the instance is adding the peers automatically or is that not happening on the back end.
I removed the 2nd peer from the instance
Saved->apply
Added the 2nd peer again
Saved->apply
I then got the handshake to work