Stale peers in Wireguard, v2

Started by fst, April 24, 2025, 11:13:43 AM

Previous topic - Next topic
Thank you! I will try also this way.

May 11, 2025, 05:25:16 PM #16 Last Edit: May 11, 2025, 05:26:48 PM by meyergru
It clearly depends on from what side the wireguard connection is initiated: although a site-to-site connection can potentially be opened from either side, it might actually not work the way you think it does.

Say, for example, the other side is behind CG-NAT. In that case, it can initiate a connection as a client, but never act as a server.

In this situation, even if you detect stale connections on your side, you cannot "repair" the connection by restarting wireguard. Thus, the "stale detection" via the cron job has to be done on both sides preferably and also in a short interval to keep interruptions small.

I am myself in a postion like this (I am not behind CG-NAT, but my peers are). Thus, the stale detection on my side - although enabled - would not help.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

May 11, 2025, 07:41:39 PM #17 Last Edit: May 11, 2025, 08:10:41 PM by FredFresh
@meyergru the 3 connections are 3 different VPNs all towards the PROTON VPN servers. So the conenction should be always initiated only from my side.

The modem is an LTE modem and the mobile operator uses a CG-NAT system (the IP I receive start with 10.xx.xx.xx), BUT the strange thing is that the 2 backup connections become stale with a very different amount of time and with a random order.

According to the OPNSENSE documentation I set the "Keepalive interval" at 25 secs, but also changing to a lower or higher doesn't change the result.


It seems that some route/firewall state change or the system drops something: now I have the third VPN online, but the the gateway monitoring is offline:
the VPN instance is pingable;
the endpoint IP is pingable;
the gateways is NOT pingable.

Looking to the live view log, I see the first ping going through the correct VPN gateway, but no reply is recorded.
Even if I restart the opnsense nothing change, If I traceroute to the gateway IP, it start again pinging and returns online.

May 11, 2025, 07:47:45 PM #18 Last Edit: May 11, 2025, 07:53:14 PM by Bob.Dig
If the privacy-vpn-servers or your LTE-connection are overloaded, there is nothing OPNsense can do...

May 11, 2025, 07:53:53 PM #19 Last Edit: May 12, 2025, 07:42:32 AM by FredFresh
@Bob.Dig - also this road was already tested: connecting to the same server from mobile phone and computer is possible. Also the load on that server is not high (according to the Proton app).
Waiting for a couple of days for everythihng returning normal, wasn't successful.

UPDATE:
found the following situation:
1 VPN peers down (second of three)
2 VPN gateways down because of monitoring (2nd and 3rd)

disabled monitoring on the 2nd and 3rd VPN gateways, in 5 minutes everything was online again. The strange thing is that the ping of the primary VPN gateways was more than the double of before.

Restored the "gateway monitoring", both the 2nd and 3rd VPN gateways were marked as offline.

The gateway monitoring system could have some diffult managing more than one VPN gateway?

SECOND UPDATE:
while I was looking more and more to the live view of the, I found this:


When everything works good: i see the initial request for the ping (and not the reply), going through the relative wireguard gateway.
Suddenly I stop to see it, and instead I start to see the returning replying trying to enter through the wiregaurd gateway with the highest priority used in that moment.

Now the monitoring IP is the same of the gateway, but before I tried to use an external IP and create a dedicated route+firewall rule to send the initial ping request always through the correct wireguard VPN, but the same behavior happened (even if I didn't clearly see in the LOG like this time).

The Routes status tell me this (the 2.1.1.1 is the proper gateway for that Monitoring IP)


Question: the routing rules are considered or not for this ping queries?
Also, this happenes with or without the flag applied to "Disable route" in each wireguard gateways.