Properly recover from WAN outages (dpinger bug)?

Started by jbakuwel, August 04, 2024, 01:15:25 AM

Previous topic - Next topic
Hi all,

I finally made the move to OPNsense from having using a plain Linux system for decades. So while not new in the field, I¨m still learning about the OPNsense (and BSD) way of doing things. Many thanks to all that have contributed to this over the years.

The OPNsense router/firewall has multiple WAN's. All traffic via one of those WANs needs to go via an OpenVPN tunnel, so that OpenVPN's tunnel is taking part in my gateway groups, not the WAN itself. This all works reasonably well, except for the fact that when that WAN goes down, the VPN reroutes via another WAN despite me trying to restrict that to one specific WAN only. This is not a big issue though, just not as tidy was I would like it.

What is of more concern is that when the WAN comes back online, OPNsense doesn´t see that. When the WAN is offline, OPNsense shows RTT=0.0ms, RTTD=0.0ms, loss=100%. This remains so when the WAN comes back onilne (the requested IP address, by DHCP in this case) is shown and the WAN surely is online. A restart of the dpinger process for that particular gateway  resolves the issue (ie. signals that the WAN is back online).

In this particular case, the physical ethernet link stays up (as OPNsense is connected to a switch) when the WAN is down.

It looks to me like a bug in/around dpinger. While I put that forward for your consideration, is there a way to restart the dpinger process, preferably only for a specific gateway, via the commandline / cron? I'm looking for ways to have this work fully automatically and reliably, ie. if this particular WAN disappears, the OpenVPN tunnel should go down (not reroute via another WAN). When this particular WAN reappears, the OpenVPN tunnel should be re-established. If that is difficult to implement with OPNsense, then it's fine if the OpenVPN tunnel reroutes via another WAN as long as it goes back to the WAN it is supposed to go through when that reappears.

It very simple for me to reproduce, so will happily test potential fixes.

Any ideas / suggestions are most welcome.

Regards,
Jan

I have a similar issue: I have a Multi WAN setup with a PPPoE DSL connection and a 5G connection.
The ISP disconnects the PPPoE connection after 24h so I've added a Cron job for a periodic interface reset which worked perfectly fine. But after adding the 5G WAN this doesn't work anymore. After the interface reset the dpinger shows a 100% package loss and the gateway is down even though it got a new IP and would work fine. The dpinger service is still running and doesn't crash or anything. After a manual restart of the Gateway monitor it works again.

Here is the debug log of the gateway. Cron job at 4:00 and manual restart of the gateway monitor at 7:50.


2024-08-20T07:50:50 Notice dpinger ALERT: WAN_GW (Addr: 8.8.8.8 Alarm: down -> none RTT: 14.5 ms RTTd: 0.1 ms Loss: 0.0 %)
2024-08-20T07:50:47 Notice dpinger Reloaded gateway watcher configuration on SIGHUP
2024-08-20T07:50:47 Warning dpinger send_interval 1000ms loss_interval 4000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 0ms loss_alarm 0% alarm_hold 10000ms dest_addr 8.8.8.8 bind_addr 2.241.65.39 identifier "WAN_GW "
2024-08-20T07:50:47 Warning dpinger exiting on signal 15
2024-08-20T04:00:07 Notice dpinger ALERT: WAN_GW (Addr: 8.8.8.8 Alarm: none -> down RTT: 0.0 ms RTTd: 0.0 ms Loss: 100.0 %)
2024-08-20T04:00:03 Notice dpinger Reloaded gateway watcher configuration on SIGHUP
2024-08-20T04:00:03 Warning dpinger send_interval 1000ms loss_interval 4000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 0ms loss_alarm 0% alarm_hold 10000ms dest_addr 8.8.8.8 bind_addr 2.241.65.39 identifier "WAN_GW "
2024-08-20T04:00:02 Notice dpinger Reloaded gateway watcher configuration on SIGHUP
2024-08-20T04:00:02 Warning dpinger exiting on signal 15
2024-08-20T04:00:02 Warning dpinger WAN_GW 8.8.8.8: sendto error: 65
2024-08-20T04:00:01 Warning dpinger WAN_GW 8.8.8.8: sendto error: 65


8.8.8.8 is only used as a monitor IP not as a DNS server.
There are many similar posts here in the forum regarding this issue. I was hoping that 24.7 might fix this but it didn't.

I too had a problem with a multi-gateway setup. Trying to ping a destination behind a second gateway never worked from the firewall, including dpinger. While pinging worked fine from a LAN client, traffic from the firewall was always directed to the WAN gateway, ignoring the routing table. This was caused by the setting Firewall: Settings: Advanced: "Disable force gateway". When this setting is off (default), OPNsense automatically creates a firewall rule directing all traffic from the firewall to the WAN gateway. With this setting on, pings follow routing rules.

I have asked before in this forum what this setting is good for and why it has this strange default, but haven't found an answer.

Currently the gateway monitors cannot restart themselves due to it causing a loop and associated intermittent disconnects to due false readings.

https://github.com/opnsense/core/issues/7027


Cheers,
Franco

Hello,

I have same problem with 40 FW and Multi WAN (principal link  FTTx and second link rescue 4G)
The ISP interupt PPPoE session in the nigth (for a link maintenance on FTTx) and after that link is up the gateway is always down.
I have to restart the gateway manually.
Otherwise the principal gateway never goes back to up.

Our version Business 24.4.2

I'm looking forward to finding a solution because this generates a lot of manual maintenance for our department.

Have a good day for all.

Planning to bring in https://github.com/opnsense/core/commit/c42fefa67f86 but we wanted to debug this further. At the moment we're suspecting a pf-related issue with a stuck state for the current dpinger process.

Could you try to find the matching state for your monitor IP under Firewall: Diagnostics: States and try to remove this one?


Cheers,
Franco

Hello Franco,

I sent you a private message for data exchange.

best regards,
Arnaud

Hi Arnaud,

Thanks. Looking forward to taking a closer look with you.


Cheers,
Franco