Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - tcm1010

#1
That looks like it will work very nicely.  Thank you, Greelan!
#2
I have been working on this for a couple weeks now, and it is time to ask for help.

I have a (mostly) working setup configured with VPN via the excellent WireGuard documentation (instance, peer, interface, gateway, firewall rules, NAT), with a couple of additional tweaks to make it multi-WAN for redundancy: 
  • gateway switching is enabled (for redundancy)
  • gateway failover and failback states are enabled (so the clients don't loose access to the Internet)
  • static routes for the WireGuard endpoints via the WAN interface (so one tunnel isn't routed via another tunnel)

When I first got it working, it was magic: a tunnel gateway would experience enough loss to trigger the gateway switch, the second gateway would become the default, and clients wouldn't notice anything had gone wrong.  Yea!!

But...I checked things every so often, and I would notice one of the tunnel gateways would be permanently down, i.e., showing 100% loss, for hours at a time, e.g., overnight.  Yet, the WireGuard instance and peer for that gateway would remain green/online.  This doesn't happen every time (of course, right?!).  I can see in the logs that the gateways will switch as loss happens on the higher priority gateway, and will switch back once the loss is low enough on the higher priority one, as expected.  But every so often, one of them gets stuck in the red/offline state.

To make a long thread short, I have discovered that when this happens, I can manually and easily fix the problem by executing in the OPNsense CLI the traceroute command through the offline gateway, and then instantly, I can ping the gateway, the gateway monitor IP no longer experiences loss, and the gateway will go green/online (and if it had the higher priority, it would switch and become the default).

I have tried this with several different WireGuard instances/peers at different locations (provider is ProtonVPN), and each one has experienced this issue.


root@OPNsense:~ # netstat -nr
Routing tables

Internet:
Destination        Gateway            Flags         Netif Expire
default            10.2.0.1           UGS             wg0
10.2.0.1           link#12            UHS             wg0     # This is the currently active/default gateway
10.2.0.2           link#3             UH              lo0     # This is the currently active/default tunnel IP
[...]
10.2.3.1           link#15            UHS             wg3     # This is the problematic gateway
10.2.3.2           link#3             UH              lo0     # This is the problematic tunnel
[...]

root@OPNsense:~ # ping -S 10.2.3.2 -c 10 1.1.1.1              # ping something via the problematic tunnel - fail
PING 1.1.1.1 (1.1.1.1) from 10.2.3.2: 56 data bytes

--- 1.1.1.1 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss

root@OPNsense:~ # time traceroute -s 10.2.3.2 1.1.1.1                                  # traceroute something via the problematic tunnel - works
traceroute to 1.1.1.1 (1.1.1.1) from 10.2.3.2, 64 hops max, 40 byte packets
 1  10.2.3.1 (10.2.3.1)  15.823 ms  14.536 ms  14.187 ms
 2  146.70.202.81 (146.70.202.81)  31.409 ms  29.895 ms  30.969 ms
 3  ae32-1932.agg4v.nyc1.us.m247.ro (146.70.1.249)  21.035 ms  20.754 ms  19.185 ms
[...]
^C
0.000u 0.006s 0:03.14 0.0% 0+0k 0+0io 0pf+0w                    # Only 3 seconds using traceroute

root@OPNsense:~ # ping -S 10.2.3.2  1.1.1.1                          # ping then works and the gateway then shows in the GUI as green/online
PING 1.1.1.1 (1.1.1.1) from 10.2.3.2: 56 data bytes
64 bytes from 1.1.1.1: icmp_seq=0 ttl=53 time=18.939 ms
64 bytes from 1.1.1.1: icmp_seq=1 ttl=53 time=26.877 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 18.939/22.908/26.877/3.969 ms


Any thoughts on what is happening to get into the stuck situation, and why "kicking" the offline gateway with traceroute seems to restore functionality?

OPNsense 26.1.2_5
#3
Quote from: nero355 on March 20, 2026, 04:19:41 PMI would like to think that there is a chance somewhere in the future ?!


At least you are not the only one with that wish :)

Thanks nero355 for pointing those out.  I did a lot of searching before posting and did not come across those threads (probably because I used "GUI" as one of the search criteria).

Shall I mark this thread [SOLVED]...or [REDUNDANT]?  :-)


#4
I have recently had to do a lot of troubleshooting, jumping between System:Gateways:Configuration, VPN:WireGuard:{Instance,Peers,Status}, etc.  Each time I need to jump from one category level to another, the previous category level collapses and becomes hidden, so going back to it is three clicks.  Is there a setting (I haven't yet found...) to keep the category/function/config levels I have already entered fully expanded?

Secondary question: are there any plans for a possible "favorites" category, so one click on a favorite would get me immediately to where I want to go?

Thank you in advance.

(Current version: 26.1.2_5)
#5
Thank you, franco, that worked.  I only needed one invocation of the command; it removed both rrddata sections.

If I wanted to reverse it and put the RRD data back, would it be:

# pluginctl -c rrddata

?


More details for the curious:

Web GUI backups with "Do not backup RRD data" checked are now ~350 kB (I have done a lot of config changes since the initial ~70 kB backup).  With that option unchecked, the size is ~3 MB.  I repeated it a few times and it remains consistent, i.e., no regression between toggling the option.  The os-sftp-backup is also ~350 kB and remains unaffected by the toggle in the Download section.

The config file has noted the change the <revision> section:

  <revision>
    [...]
    <description>Flushed rrddata via pluginctl</description>
    [...]
  </revision>


#6
My configuration backups currently seem to always contain RRD data, regardless if the option "Do not backup RRD data" is checked or not in the web GUI.

Taking a closer look at the backup XML file, an XML tree browser shows:

if "Do not backup RRD data" is checked:

[...]
<syslog/>
+<rrddata></rrddata>          # RRD data is there
+<schedules></schedules>
</opnsense>


if "Do not backup RRD data" is unchecked:

[...]
<syslog/>
+<rrddata></rrddata>          # RRD data is still there...
+<schedules></schedules>
+<rrddata></rrddata>          # ...and another copy is here!
</opnsense>


From what I can gather of the GUI backups I have taken manually via the web GUI in the past few months... I started with OPNsense version 25.7.11-2, and backups without the RRD data were about ~70 kB in size, with the RRD data, 2.5 MB.  This is what I expect.

I updated to version 26.1, and then the backups with and without the RRD data were of equal size, about 2.7 MB at the time.  So regardless of the web GUI option, the backups now contained the RRD data.

At the time, I discovered in the logs I was experiencing the error in issue 9686 (https://github.com/opnsense/core/issues/9686), and so I applied the patch mentioned therein:

# opnsense-patch 6933841c6

The errors in the logs went away, as expected.  However, the backups with the RRD data were now ~6 MB in size (!) and without the RRD data ~3 MB in size.  (See the XLM browser snippet mentioned above.  This is my current state.)

I then upgraded to 26.1.2_5 and also installed the plug-in os-sftp-backup to try automated backups.  The sftp backups are ~3 MB in size.  Given my current situation, I do not know if these are supposed to include the RRD data or not.  (There does not seem to be a separate option to include or not include the RRD backups in the sftp backup section of the web GUI.)


So, now the questions:

1.  How can I get back to a state where the configuration backups without the RRD data are "small" again?

2.  If the os-sftp-backup plug-in is working as designed (i.e., including the RRD data), how may I request an option in that section of the web GUI with the same option, i.e., "Do not backup RRD data"?


P.S.  For completeness, the /conf/config.xml file is ~3 MB and has one copy of the rrddata.