Multi-Gateway: Brief interruption of ALL traffic when other GWs have events

Started by motoridersd, September 06, 2021, 04:49:07 PM

Previous topic - Next topic
I currently have 3 gateways configured, but this happened when I had two.

One of them is a 1000/35 connection with a data cap, the other is an LTE connection with no data cap. Third connection is Wireguard tunnel sent out LTE connection Most traffic is configured to use the Cable WAN, and a few rules send other heavy traffic out the LTE connection.

The LTE performance varies throughout the day, being very good at night, but slow during the day when it is congested.

The issue I have is that if I configure gateway monitoring to use a distant host to determine packet loss, for example, pinging 1.1.1.1 over the LTE connection, as the LTE link gets congested, the packet loss starts crossing the configured threshold and an alarm is generated in the Gateway log file. This is fine. The problem is that every time there is an event, ALL connections on the network drop out briefly. This includes the traffic that is being sent out the Cable gateway.

I created some Gateway Groups, with Cable as Tier 1 and LTE as Tier 5. At first I was using this group as the gateway for my main traffic, but even if the LTE gateway is having issues, the traffic going out the Tier 1 interface should not be interrupted.

Even when setting my default internet rule to use the Single Cable gateway, I was still seeing drops/connection issues when the LTE ping times to 1.1.1.1 went above the threshold.

The fix was to have the LTE connection ping the LTE modem instead of an external IP, but this unfortunately leaves me with no ability to switch traffic based on congestion.

When I added the Wireguard interface over LTE, the ping to the default gateway means that when the LTE connection starts congesting, the Wireguard tunnel monitoring starts seeing loss (the gateway is on the other end traversing the LTE network). I started having packet loss on my LTE+Cable Group (even though the WG interface is not part of this group and the LTE gateway monitors a local IP and there was no packet loss there).

Logs of the WG interface event

2021-09-05T07:57:11 dpinger[62885] GATEWAY ALARM: WAN_PIAWG_IPv4 (Addr: 10.9.128.1 Alarm: 0 RTT: 430812us RTTd: 291906us Loss: 8%)
2021-09-05T07:57:11 dpinger[2274] WAN_PIAWG_IPv4 10.9.128.1: Clear latency 430812us stddev 291906us loss 8%
2021-09-05T07:56:54 dpinger[46066] GATEWAY ALARM: WAN_PIAWG_IPv4 (Addr: 10.9.128.1 Alarm: 1 RTT: 510552us RTTd: 216970us Loss: 6%)
2021-09-05T07:56:54 dpinger[2274] WAN_PIAWG_IPv4 10.9.128.1: Alarm latency 510552us stddev 216970us loss 6%
2021-09-05T07:54:05 dpinger[69999] GATEWAY ALARM: WAN_PIAWG_IPv4 (Addr: 10.9.128.1 Alarm: 0 RTT: 432131us RTTd: 246902us Loss: 1%)
2021-09-05T07:54:05 dpinger[2274] WAN_PIAWG_IPv4 10.9.128.1: Clear latency 432131us stddev 246902us loss 1%
2021-09-05T07:53:53 dpinger[80288] GATEWAY ALARM: WAN_PIAWG_IPv4 (Addr: 10.9.128.1 Alarm: 1 RTT: 502068us RTTd: 187465us Loss: 1%)
2021-09-05T07:53:53 dpinger[2274] WAN_PIAWG_IPv4 10.9.128.1: Alarm latency 502068us stddev 187465us loss 1%


Screenshot of Gateway config attached. Cable does ping a remote gateway because lately my ISP has a lot of issues with packet loss and I want the system to failover to the LTE connection when this happens.

Go to Firewall: Settings: Advanced and check "Disable State Killing on Gateway Failure".


Cheers,
Franco