OPNsense Forum

English Forums => 26.1 Series => Topic started by: tcm1010 on March 22, 2026, 07:58:22 PM

Title: WireGuard gateway intermittently goes offline and stays offline
Post by: tcm1010 on March 22, 2026, 07:58:22 PM
I have been working on this for a couple weeks now, and it is time to ask for help.

I have a (mostly) working setup configured with VPN via the excellent WireGuard documentation (instance, peer, interface, gateway, firewall rules, NAT), with a couple of additional tweaks to make it multi-WAN for redundancy: 

When I first got it working, it was magic: a tunnel gateway would experience enough loss to trigger the gateway switch, the second gateway would become the default, and clients wouldn't notice anything had gone wrong.  Yea!!

But...I checked things every so often, and I would notice one of the tunnel gateways would be permanently down, i.e., showing 100% loss, for hours at a time, e.g., overnight.  Yet, the WireGuard instance and peer for that gateway would remain green/online.  This doesn't happen every time (of course, right?!).  I can see in the logs that the gateways will switch as loss happens on the higher priority gateway, and will switch back once the loss is low enough on the higher priority one, as expected.  But every so often, one of them gets stuck in the red/offline state.

To make a long thread short, I have discovered that when this happens, I can manually and easily fix the problem by executing in the OPNsense CLI the traceroute command through the offline gateway, and then instantly, I can ping the gateway, the gateway monitor IP no longer experiences loss, and the gateway will go green/online (and if it had the higher priority, it would switch and become the default).

I have tried this with several different WireGuard instances/peers at different locations (provider is ProtonVPN), and each one has experienced this issue.


root@OPNsense:~ # netstat -nr
Routing tables

Internet:
Destination        Gateway            Flags         Netif Expire
default            10.2.0.1           UGS             wg0
10.2.0.1           link#12            UHS             wg0     # This is the currently active/default gateway
10.2.0.2           link#3             UH              lo0     # This is the currently active/default tunnel IP
[...]
10.2.3.1           link#15            UHS             wg3     # This is the problematic gateway
10.2.3.2           link#3             UH              lo0     # This is the problematic tunnel
[...]

root@OPNsense:~ # ping -S 10.2.3.2 -c 10 1.1.1.1              # ping something via the problematic tunnel - fail
PING 1.1.1.1 (1.1.1.1) from 10.2.3.2: 56 data bytes

--- 1.1.1.1 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss

root@OPNsense:~ # time traceroute -s 10.2.3.2 1.1.1.1                                  # traceroute something via the problematic tunnel - works
traceroute to 1.1.1.1 (1.1.1.1) from 10.2.3.2, 64 hops max, 40 byte packets
 1  10.2.3.1 (10.2.3.1)  15.823 ms  14.536 ms  14.187 ms
 2  146.70.202.81 (146.70.202.81)  31.409 ms  29.895 ms  30.969 ms
 3  ae32-1932.agg4v.nyc1.us.m247.ro (146.70.1.249)  21.035 ms  20.754 ms  19.185 ms
[...]
^C
0.000u 0.006s 0:03.14 0.0% 0+0k 0+0io 0pf+0w                    # Only 3 seconds using traceroute

root@OPNsense:~ # ping -S 10.2.3.2  1.1.1.1                          # ping then works and the gateway then shows in the GUI as green/online
PING 1.1.1.1 (1.1.1.1) from 10.2.3.2: 56 data bytes
64 bytes from 1.1.1.1: icmp_seq=0 ttl=53 time=18.939 ms
64 bytes from 1.1.1.1: icmp_seq=1 ttl=53 time=26.877 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 18.939/22.908/26.877/3.969 ms


Any thoughts on what is happening to get into the stuck situation, and why "kicking" the offline gateway with traceroute seems to restore functionality?

OPNsense 26.1.2_5