Dpinger make a mess on latest release

Started by Nicolassimond, May 24, 2024, 09:10:07 AM

Previous topic - Next topic
Hello,
Since a couple of days I have a problem with Dpinger and the gateway moniroting.

I have OPNsense 24.1.7_4-amd64 and I have a gateway group with 2 tiers:
• My Starlink connection is in Tier 1
• My ADSL connection is in Tier 2 for backup purposes.

Since I've got back from the holidays, I've noticed that Opnsense disconnects my starlink every 10 to 20 minutes.

Date
Severity
Process
Line
2024-05-24T08:59:51 Warning dpinger send_interval 1000ms loss_interval 4000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 0ms loss_alarm 0% alarm_hold 10000ms dest_addr 100.64.0.1 bind_addr 100.109.13.234 identifier "WANGW "
2024-05-24T08:59:51 Warning dpinger exiting on signal 15
2024-05-24T08:59:51 Warning dpinger exiting on signal 15
2024-05-24T08:59:51 Warning dpinger exiting on signal 15
2024-05-24T08:50:04 Warning dpinger send_interval 1000ms loss_interval 4000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 0ms loss_alarm 0% alarm_hold 10000ms dest_addr 100.109.13.234 bind_addr 100.109.13.234 identifier "WANGW "
2024-05-24T08:50:04 Warning dpinger exiting on signal 15
2024-05-24T08:50:03 Warning dpinger WANGW 100.109.13.234: sendto error: 65
2024-05-24T08:50:02 Warning dpinger WANGW 100.109.13.234: sendto error: 65
2024-05-24T08:50:01 Warning dpinger WANGW 100.109.13.234: sendto error: 65
2024-05-24T08:35:47 Warning dpinger send_interval 1000ms loss_interval 4000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 0ms loss_alarm 0% alarm_hold 10000ms dest_addr 100.109.13.234 bind_addr 100.109.13.234 identifier "WANGW "
2024-05-24T08:35:47 Warning dpinger exiting on signal 15
2024-05-24T08:35:46 Warning dpinger WANGW 100.109.13.234: sendto error: 65
2024-05-24T08:35:45 Warning dpinger WANGW 100.109.13.234: sendto error: 65


The fact is that the starlink isn't disconnecting at all.

I also have monitoring of both my gateways on a LibreNMS instance locally, and it doesn't see these packet drops (Those showing are because a Starlink modem restart), see attachment.

And if I force all my internet to go through starlink, internal and external monitoring (I also self-host some things behind my starlink with some checks from OVH) doesn't see anything never ...

Same when I play online, I don't see any lag or get disconnected, but in the logs, the OPNSense says the gateway is lost.

Any idea of what may be going on?
Thank you

I can confirm this issue on StarLink WAN: https://forum.opnsense.org/index.php?topic=40613.0

As temporary workaround I deactivated gateway monitoring for SL WAN

Quote from: apunkt on May 26, 2024, 10:07:51 AM
I can confirm this issue on StarLink WAN: https://forum.opnsense.org/index.php?topic=40613.0

As temporary workaround I deactivated gateway monitoring for SL WAN

Thannks for the feedback.

Tier group works without gateway monitoring?
I will try this when I'm home.

Not the same condition, but when I was running pfsense at home, dpinger was also causing my gateway to fail if it had too many dropped pings. This happeneed when I switched from Spectrum to t-moblie internet and the t-mobile just wouldn't ping back well enough. In pf there was a setting where you can turn off the "drop connection when ping fails", but I don't remember where there setting is so I can look for it in OPNsense. Currently not running either at home, OPN at work where we have good stable connections and not an issue.

A couple of things:

1. What thresholds do you have set to fail over with? i.e. Packetloss high/low or Latency high/low?
2. The graphs you showed looked like RRDtool graph? Is that smokeping you're using? The default for it is 20 pings every 300 seconds, so it not showing wouldn't be a surprise.  if it's not that, how often is the pinger used for those graphs going off?
OPNsense 24.7.7 running on:
Dell Optiplex 3050
Intel I5-7600 @ 3.5Ghz (4 Cores)
Intel I350-T4 Nic
8G DDR4
256G SSD

Quote from: axsdenied on May 28, 2024, 10:57:37 PM
A couple of things:

1. What thresholds do you have set to fail over with? i.e. Packetloss high/low or Latency high/low?
2. The graphs you showed looked like RRDtool graph? Is that smokeping you're using? The default for it is 20 pings every 300 seconds, so it not showing wouldn't be a surprise.  if it's not that, how often is the pinger used for those graphs going off?

1- Failover is using link down.
2. It's 5 pings every 60 secondes here.

The thing is that is not the stability of starlink.

I won't go at all the things I have at home, you can take a quick look here: https://wiki.abyssproject.net/en/infra/blog-infrastructure

The thing to remember, is that I monitor the starlink with other check than opnsense, both inside and outside the network, and I don't have any drops.

I also have a permanent wireguard tunnel that doesn't log any drop or any error.
+ all the checks from my internal LibreNMS that monitor my external services (+/- 70 checks, every 60 seconds) that include some pings and that don't fail or record any latencies neither.

Dpinger has a bug in OPNSense, or starlink changed something that make opnsense thinks the network is down, but it isn't.

Bump this post.

I am having the same issue.  OPNsense switches over from StarLink to Cox even though I never lose StarLink.  The logs I am seeing are the same as the OP.  At this point I have had to remove the GW Monitor for COX from the gateway group for my devices to stay on StarLink.

Just started happening this morning.  Noticed this when I was on 24.1.10_8.  Upgraded to 24.7_9 and still seeing the issue.

Has anyone found a solution for this.

Quote from: EvilAchmed on July 31, 2024, 06:33:27 PM
Bump this post.

I am having the same issue.  OPNsense switches over from StarLink to Cox even though I never lose StarLink.  The logs I am seeing are the same as the OP.  At this point I have had to remove the GW Monitor for COX from the gateway group for my devices to stay on StarLink.

Just started happening this morning.  Noticed this when I was on 24.1.10_8.  Upgraded to 24.7_9 and still seeing the issue.

Has anyone found a solution for this.

Hi, I have found a dumb solution for this problem.
Just put a basic switch between the starlink router and your opnsense, and you won't be annoyed anymore.

I think the starlink router has an ARP management problem or something related that just don't work with dpinger.

Quote from: Nicolassimond on August 05, 2024, 08:14:02 AM

Hi, I have found a dumb solution for this problem.
Just put a basic switch between the starlink router and your opnsense, and you won't be annoyed anymore.

I think the starlink router has an ARP management problem or something related that just don't work with dpinger.

I tired this but the issue is still occurring.  Any other idea's?

Quote from: EvilAchmed on August 06, 2024, 05:21:04 PM
Quote from: Nicolassimond on August 05, 2024, 08:14:02 AM

Hi, I have found a dumb solution for this problem.
Just put a basic switch between the starlink router and your opnsense, and you won't be annoyed anymore.

I think the starlink router has an ARP management problem or something related that just don't work with dpinger.

I tired this but the issue is still occurring.  Any other idea's?


Did you block bogon or private network on starlink interface configuration? Try to disable both.

Quote from: Nicolassimond on August 07, 2024, 08:33:07 AM
Quote from: EvilAchmed on August 06, 2024, 05:21:04 PM
Quote from: Nicolassimond on August 05, 2024, 08:14:02 AM

Hi, I have found a dumb solution for this problem.
Just put a basic switch between the starlink router and your opnsense, and you won't be annoyed anymore.

I think the starlink router has an ARP management problem or something related that just don't work with dpinger.

I tired this but the issue is still occurring.  Any other idea's?


Did you block bogon or private network on starlink interface configuration? Try to disable both.


I have disabled both Block Bogon and Private Network's and I am still seeing the issue.  I am almost temped to wait until Starlink pushes out another update to see if that fixes it.  Started 1 week ago today right after they pushed a firmware update to my Starlink Router, which I have setup in pass through mode.