Stale peers in Wireguard, v2

fst · April 24, 2025, 11:13:43 AM

I have one opnsense installation (out of 3) where Wireguard is disconnecting every 10 days or so. The peer is showing as "stale". The only way to reestablish the link is by rebooting. These options have failed:
- disabling and reenabling
- shell: /usr/local/opnsense/scripts/Wireguard/wg-service-control.php stop/start xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

On the same hardware, wireguard was working flawlessly. The other side of wireguard has not changed when switching from pfsense to opnsense.
At the times this is happening, there is no log entry in /var/log/wireguard/*
I can see my restart attempts in the log, to no avail:
<37>1 2025-04-24T01:54:58+02:00 gwtsb.tonstudiobeusch.ch wireguard 56813 - [meta sequenceId="1"] wireguard instance WGHD (wg0) stopped
<37>1 2025-04-24T01:54:58+02:00 gwtsb.tonstudiobeusch.ch wireguard 58680 - [meta sequenceId="2"] /usr/local/opnsense/scripts/Wireguard/wg-service-control.php: ROUTING: entering configure using opt1
<37>1 2025-04-24T01:54:58+02:00 gwtsb.tonstudiobeusch.ch wireguard 58680 - [meta sequenceId="3"] /usr/local/opnsense/scripts/Wireguard/wg-service-control.php: plugins_configure monitor (,[GWHD])
<37>1 2025-04-24T01:54:58+02:00 gwtsb.tonstudiobeusch.ch wireguard 58680 - [meta sequenceId="4"] /usr/local/opnsense/scripts/Wireguard/wg-service-control.php: plugins_configure monitor (execute task : dpinger_configure_do(,[GWHD]))
<37>1 2025-04-24T01:54:58+02:00 gwtsb.tonstudiobeusch.ch wireguard 58680 - [meta sequenceId="5"] wireguard instance WGHD (wg0) started

I have not found any other log entries. Is there a way to debug this?

FredFresh · May 01, 2025, 11:30:53 AM

Hi here the same situation. I have seen that also the rest of the modem is enough (my modem is in brdge mode so the opnsense see the change of the public ip).

It seems it requires a push to re-initiate something.

Monviech (Cedrik) · May 01, 2025, 11:34:33 AM

This is expected because its how wireguard works.

Read about it in this discussion:

https://github.com/opnsense/docs/pull/691

Stale does not mean anything is wrong.

If you need a true connected session based vpn use ipsec or openvpn.

FredFresh · May 01, 2025, 11:45:17 AM

Hi Cedrik, Ialready have a keep alive interval of 25 seconds, in order to keep it active should I reduce it?

I use wireguard only for outgoing connections , when the peer become stale, in case there is a request it should become active again?

Monviech (Cedrik) · May 01, 2025, 01:05:52 PM

Read the "NAT and Firewall Traversal Persistence" section.

https://www.wireguard.com/quickstart/

If wireguard does not have matching traffic, it does not send anything.

Its triggered by matching traffic.

Bob.Dig · May 01, 2025, 02:32:59 PM

There is an option to "Renew DNS for WireGuard on stale connections" in System/Cron. Try it.

FredFresh · May 05, 2025, 07:31:59 PM

Hi,

@Bob.Dig - that option is already active, once each hour, but it has no effect.

@Cedrik - once the peer is in stale status and i try to force some traffic through it, the state does not change. Shouldn't it go online having traffic?

Monviech (Cedrik) · May 05, 2025, 07:40:58 PM

The stale status depends on when the last handshake happened. The code checks if it was less than 5 minutes ago, and if thats the case assumes the peer is online.

If traffic happens, the peer should change state because a new handshake should happen.

Look at the Wireguard diagnostics page, there is a value in seconds how long ago the last handshake was.

FredFresh · May 05, 2025, 07:44:03 PM

in this case, 1200 seconds. But trying to force traffic through it, should trigger a new handshake?

Additional info:
- restarting the wireguard, does not trigger a new handshake;
- restarting the opnsense, does not trigger a new handshake;
- restarting the modem (in bridge modem) change the wan ip and trigger the new handshake.

Just to complete the pieces of information above, after the point 3, to trigger again the correct pinging to the from the gateway of that wireguard connection to the monitoring IP, I have to perform a trace route from my pc or from the opnsense towards the gateway IP

FredFresh · May 11, 2025, 03:11:49 PM

@Monviech thank you for the feedbacks and your time assisting. I would like to ask for some more assistance from an expert (you)

I am actually very frustrated as this is the last thing I need to avoid to check every day the firewall (and have more time for other things) and it is at least one year that I am struggling on this.

The abnormal behaviors are:
•   Peers become stale even if are used. I have three connections, managed through 3 gateways with different priorities. Also the one actually used became stale (with ongoing connections);
•   The two backup connections (not used because the first one is online), behave differently: one can become stale after 2-3 days while the other can last for weeks.
•   Even if I try to force traffic through the stale gateway, It never return online. If I try restart the peer or the wireguard service, such peer is marked as offline. Even I fi restart the opsense, it doesn't return online
•   The only way to a new handshake with the wireguard endpoint is to change the WAN address (restart the modem). After the restart, the peers is online but the gateway not, I have to perform a traceroute towards the gateways in order to bring it online again.

Please, do you have any suggestion on hove to solve this issue?

Thank you

Monviech (Cedrik) · May 11, 2025, 03:18:55 PM

Use monit and ping through your tunnels and let it send you an email if the ping fails.

That way you can see if something is actually wrong.

FredFresh · May 11, 2025, 03:29:26 PM

Unfortunately I already know that it would require to correct the situation at least once every two days. Even if it is not a solution, can you suggest a way to further analyze this situation?

Monviech (Cedrik) · May 11, 2025, 03:38:42 PM

Wireguard does not have many ways to analyze it out of the box.

If it fails, there's most likely an issue in communication between the endpoint and the peer.

This can be firewall rules, firewall states, dynamic IPs, CGNAT, Provider issues, dns issues, etc...

FredFresh · May 11, 2025, 04:52:30 PM

ok, the fun "must go on", but opnsense is fantastic a I want to find a solution rather than a patch.

What is the best log to check to try to investigate the problems exposed so far? I am not a technician and usually I use the live view of the firewall, but I don't think it is suitable for this kind of ivestigation.

I created dedicated firewall rules just to create log record, but on the live view I can't see anything. Is it a good idea to download the .csv from the "plain view" page and elaborate with excel?

Monviech (Cedrik) · May 11, 2025, 05:11:42 PM

The best tool to troubleshoot is packet captures. If a tunnel fails use tcpdump from the shell or packet capture in the webgui and see what happens to the wireguard packets.

Stale peers in Wireguard, v2

fst

April 24, 2025, 11:13:43 AM

FredFresh

May 01, 2025, 11:30:53 AM #1

Monviech (Cedrik)

May 01, 2025, 11:34:33 AM #2

FredFresh

May 01, 2025, 11:45:17 AM #3

Monviech (Cedrik)

May 01, 2025, 01:05:52 PM #4

Bob.Dig

May 01, 2025, 02:32:59 PM #5

FredFresh

May 05, 2025, 07:31:59 PM #6

Monviech (Cedrik)

May 05, 2025, 07:40:58 PM #7

FredFresh

May 05, 2025, 07:44:03 PM #8 Last Edit: May 05, 2025, 08:28:43 PM by FredFresh

FredFresh

May 11, 2025, 03:11:49 PM #9

Monviech (Cedrik)

May 11, 2025, 03:18:55 PM #10

FredFresh

May 11, 2025, 03:29:26 PM #11

Monviech (Cedrik)

May 11, 2025, 03:38:42 PM #12

FredFresh

May 11, 2025, 04:52:30 PM #13

Monviech (Cedrik)

May 11, 2025, 05:11:42 PM #14