OPNsense
  • Home
  • Help
  • Search
  • Login
  • Register

  • OPNsense Forum »
  • Archive »
  • 23.1 Legacy Series »
  • Multi WAN Dpinger needs restarting after gateway outage Workaround
« previous next »
  • Print
Pages: 1 [2] 3

Author Topic: Multi WAN Dpinger needs restarting after gateway outage Workaround  (Read 14510 times)

RedVortex

  • Jr. Member
  • **
  • Posts: 97
  • Karma: 9
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #15 on: May 05, 2023, 09:31:31 pm »
Actually, in 23.1.7_3 the problem seems worse... Unplugging and replugging the SL ether cable in igb0, triggers the problem every time it seems.

I did it 3 times and the state looks like this each time now...  :-\

Code: [Select]
all icmp 100.79.101.92:7232 -> 1.1.1.1:7232       0:0
   age 00:00:36, expires in 00:00:10, 36:0 pkts, 1008:0 bytes, rule 90
   id: c4e7556400000000 creatorid: 7ac5a56d gateway: 0.0.0.0
   origif: pppoe0

again, restarting dpinger or killing the state brings the gateway status to UP and the state to what it should be

Code: [Select]
all icmp 100.79.101.92:9493 -> 1.1.1.1:9493       0:0
   age 00:00:16, expires in 00:00:09, 16:16 pkts, 448:448 bytes, rule 100
   id: 15eb556400000000 creatorid: 7ac5a56d gateway: 100.64.0.1
   origif: pppoe0

Also notice the rule goes from 90 to 100. 100 is usually what I see when it works, I believe it's the default rule that allows traffic from the OPNSense to anywhere and 90 is the rule associated to DHCP.
« Last Edit: May 05, 2023, 09:33:49 pm by RedVortex »
Logged

RedVortex

  • Jr. Member
  • **
  • Posts: 97
  • Karma: 9
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #16 on: May 07, 2023, 02:25:50 am »
With 23.1.7_3, SL gateway always ends up being flagged as down if I use 1.1.1.1 whether I use the "Disable Host Route" or not. I tried multiple things to keep it up but after some time it ends up failing because of the state gateway that ends up sending the packets to the pppoe0 wan instead of SL.

So I'm dropping the idea of using the 1.1.1.1 altogether for now as this seems really problematic likely because of the dhcp renewal on SL that sends 1.1.1.1 as a dns maybe ? Anyways, I'll be testing with 9.9.9.9 instead and see how it goes.

Did using another IP than 1.1.1.1 fixed it for you @xaxero ? Also, have you upgraded to 23.1.7_3 yet ?
Logged

xaxero

  • Newbie
  • *
  • Posts: 23
  • Karma: 1
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #17 on: May 07, 2023, 09:29:10 am »
Good Morning
    Changing to openDNS has resulted in a big improvement. 48 hours with no issues. However SL has been very stable. The second  unit I simply use the SL Gateway address.

Note: As I am using the Dual antenna setup I have put in a second router at the front end simply to NAT the traffic and so I have a unique gateway for each antenna and tagging the packets onto separate VLANS to our main router several decks down. 2 WANS with the same gateway was problematic if we had to do a full system power cycle.
With the front end router I am disabling gateway monitoring and I am doing all the DPinger stuff on the main router. Also Disable host route may have helped as well.

Another slimy hack is to force all passenger traffic through the 4G-Starlink-Primary interface via the firewall so this bypasses dpinger completely. The more critical ship traffic goes through the Gateway failover and the worst case scenario is that we are stuck on the VSAT until I can restart Dpinger.

I have attached the gateway configuration of the front and and the core routers. So far it has been working well.
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17709
  • Karma: 1618
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #18 on: May 08, 2023, 12:01:59 pm »
You can use the following to inspect host route behaviour now:

# pluginctl -r host_routes

An overlap between facilities IS possible and the last match wins which may break DNS or monitoring facility... That's why disable host route was added to monitor settings in which case the DNS is still active and dpinger monitoring latches on to interface IP anyway so routing should be ok (if no PBR is used breaking that as well).


Cheers,
Franco
Logged

RedVortex

  • Jr. Member
  • **
  • Posts: 97
  • Karma: 9
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #19 on: May 08, 2023, 08:29:42 pm »
Quote from: franco on May 08, 2023, 12:01:59 pm
You can use the following to inspect host route behaviour now:

# pluginctl -r host_routes

An overlap between facilities IS possible and the last match wins which may break DNS or monitoring facility... That's why disable host route was added to monitor settings in which case the DNS is still active and dpinger monitoring latches on to interface IP anyway so routing should be ok (if no PBR is used breaking that as well).

Hello franco  :)

Ok, so everything remained stable (but I did not test for very long, maybe 12h) while I was using 9.9.9.9. I've configured 1.1.1.1 again on SL, saved gateway and then saved the interface as well to restart it.

For now I see this (everything normal and gateway is marked UP)

Code: [Select]
root@xxxxx:~ # pfctl -ss -vvv | grep "1\.1\.1\.1" -A 3
No ALTQ support in kernel
ALTQ related functions disabled
all icmp 100.79.101.92:47540 -> 1.1.1.1:47540       0:0
   age 00:03:49, expires in 00:00:10, 225:225 pkts, 6300:6300 bytes, rule 100
   id: a7325d6400000000 creatorid: 7ac5a56d gateway: 100.64.0.1
   origif: igb0

Code: [Select]
root@xxxxx:~ # pluginctl -r host_routes
{
    "core": {
        "8.8.8.8": null,
        "8.8.4.4": null
    },
    "dpinger": {
        "8.8.4.4": "10.50.45.70",
        "1.1.1.1": "100.64.0.1",
        "2001:4860:4860::8844": "fe80::200:xxxx:xxxx:xxx%igb0",
        "149.112.112.112": "192.168.2.1",
        "2620:fe::9": "2001:470:xx:4x:x"
    }
}

10.50.45.70 is my default gateway that uses pppoe0 interface
100.64.0.1 is SL and is used as backup gateway on igb0

Code: [Select]
root@xxxxx:~ # netstat -rn | head
Routing tables

Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            10.50.45.70        UGS      pppoe0
1.1.1.1            100.64.0.1         UGHS       igb0
8.8.4.4            10.50.45.70        UGHS     pppoe0
10.2.0.0/16        192.168.2.1        UGS         em0
10.50.45.70        link#16            UHS      pppoe0
34.120.255.244     link#4             UHS        igb0

After 2-3 mins, I see the routing tables loses 1.1.1.1 (SL dhcp renewal I guess) but so far everything remains functional

Code: [Select]
root@xxxxx:~ # netstat -rn | head
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            10.50.45.70        UGS      pppoe0
8.8.4.4            10.50.45.70        UGHS     pppoe0
10.2.0.0/16        192.168.2.1        UGS         em0
10.50.45.70        link#16            UHS      pppoe0
34.120.255.244     link#4             UHS        igb0
100.64.0.0/10      link#4             U          igb0

Everything else remains the same and gateway is, for now, marked UP. When I get back home, I'll test the ethernet cable pull/plug, that usually seems to trigger the issue and I'll let you know what I get then.
« Last Edit: February 04, 2024, 10:31:56 pm by RedVortex »
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17709
  • Karma: 1618
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #20 on: May 09, 2023, 11:41:30 am »
Hello RedVortex :)

Hmm, how about this one?

# grep -nr "1\.1\.1\.1" /var/db/dhclient.leases.*

If SL is pushing routes it will scrub them on a renew perhaps.


Cheers,
Franco
Logged

lazyE

  • Newbie
  • *
  • Posts: 3
  • Karma: 0
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #21 on: May 10, 2023, 04:40:29 am »
Hi,

FWIW, see this too for Multi-WAN Gateway monitor.

Monitor IP / dpinger not reliable in simulated fail & failback scenarios

Can only "fix" it be restart of Gateway service  :(
« Last Edit: May 11, 2023, 01:01:29 am by lazyE »
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17709
  • Karma: 1618
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #22 on: May 10, 2023, 09:03:49 am »
Keep in mind that some DNS servers have been known to rate-limit or block ping requests so it looks bad but it's not. From the OPNsense perspective the alarm has to be raised even though it's not necessary and disruptive.


Cheers,
Franco
Logged

lazyE

  • Newbie
  • *
  • Posts: 3
  • Karma: 0
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #23 on: May 11, 2023, 09:14:02 am »
So I've been testing Multi-Wan gateway failover for quite a few hours now.

Does not work with Trigger Level = "Packet Loss" option for 23.latest or even back to 22.7.latest

Scenario: Primary gateway with Trigger Level = "Packet Loss" option set then block downstream ping does NOT cause gateway to be marked as down nor default route to be flipped to Secondary. Have to manually restart Gateway service (then it notices).

Failback works ok.

Works ok if Trigger Level = "Member Down" however, this is a less likely real-world scenario where ISP is up but internet service is interrupted.

 
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17709
  • Karma: 1618
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #24 on: May 11, 2023, 09:26:23 am »
See https://github.com/opnsense/core/issues/6231 -- packetloss and delay triggers have been broken inherently with the switch from apinger to dpinger. The latter never supported the lower thresholds. I'm trying to avoid dealing with dpinger for alarm decisions in 23.7 to bring back the desired behaviour and dpinger then is left to only monitor.


Cheers,
Franco
Logged

lazyE

  • Newbie
  • *
  • Posts: 3
  • Karma: 0
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #25 on: May 11, 2023, 10:24:15 am »
Quote from: franco on May 11, 2023, 09:26:23 am
See https://github.com/opnsense/core/issues/6231 -- packetloss and delay triggers have been broken inherently with the switch from apinger to dpinger. The latter never supported the lower thresholds. I'm trying to avoid dealing with dpinger for alarm decisions in 23.7 to bring back the desired behaviour and dpinger then is left to only monitor.


Cheers,
Franco

thanks Franco.   Read through the issues thread. Appreciate the detail there.

What timeframe are you thinking for the fix ?

Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17709
  • Karma: 1618
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage
« Reply #26 on: May 11, 2023, 10:34:46 am »
It might take 1 more month for the final code to hit development, but as I said the plan is to have it in production for 23.7 in July (not sooner due to considerable changes).


Cheers,
Franco
Logged

xaxero

  • Newbie
  • *
  • Posts: 23
  • Karma: 1
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage Workaround
« Reply #27 on: May 12, 2023, 06:03:27 am »
I am collating the data from this post and others Applies to Starlink only but may be useful elsewhere. Applied the following fixes  from everyone's suggestions and the gateways are stable - We are having frequent outages as we are in laser link territory however the link is stable overall.

1/. Wan Definition Reject leases from 192.168.100.1 (note gateways are on separate router in my case)
2/. Gateway - Disable host route.
3/. Monitor IP that is not 1.1.1.1 (In my case open DNS) and bind each interface to DNS via Settings General.

Interfaces have been going up and down last 24 hours and the gateways (so far) are behaving and the routes are changing dynamically
Logged

xaxero

  • Newbie
  • *
  • Posts: 23
  • Karma: 1
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage Workaround
« Reply #28 on: May 12, 2023, 06:46:18 am »
Last thought - perhaps we could include httping as an option in the future as well as dpinger. http has much higher priority.
Logged

franco

  • Administrator
  • Hero Member
  • *****
  • Posts: 17709
  • Karma: 1618
    • View Profile
Re: Multi WAN Dpinger needs restarting after gateway outage Workaround
« Reply #29 on: May 12, 2023, 09:13:12 am »
That leaves only the question of who will write and integrate a new solution for the problem someone though solved a decade ago.  ;)


Cheers,
Franco
Logged

  • Print
Pages: 1 [2] 3
« previous next »
  • OPNsense Forum »
  • Archive »
  • 23.1 Legacy Series »
  • Multi WAN Dpinger needs restarting after gateway outage Workaround
 

OPNsense is an OSS project © Deciso B.V. 2015 - 2024 All rights reserved
  • SMF 2.0.19 | SMF © 2021, Simple Machines
    Privacy Policy
    | XHTML | RSS | WAP2