[SOLVED] Gateway down (dpinger) but DHCP works

Started by seki, May 15, 2023, 10:53:42 PM

Previous topic - Next topic
May 15, 2023, 10:53:42 PM Last Edit: May 31, 2023, 02:01:33 AM by seki
Hey guys!

I'm experiencing a weird problem lately. Having latest version 23.1.7 my gateway goes offline couple of times a day. Up to even 10-12 times a day.
I blamed ISP at first but did some tests and it doesn't seem like it's them.

But to the point.

Gateway goes offline. I can do one of the two options:
 

       
  • Reboot OPNsense
  • Flap interface (ifconfig re0 down && ifconfig re0 up)

Both works. Flap is considerably faster.

So here are my two questions:

1. Anyone experiencing this as well? I am not losing DHCP lease. It's still there. Just dpinger marks my WAN_DHCP Gateway as Offline and it will remain until I reboot or flap WAN iface.

2. Any ideas for a quick workaround? I was thinking about small BASH script similar to Ooker's script but in my case I don't lose DHCP. It's just dpinger marks my GW as down and that's it. Until I flap/reboot.


<12>1 2023-05-15T13:16:37+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="13"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T13:27:54+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 13957us stddev 14964us loss 22%
<13>1 2023-05-15T13:27:54+02:00 fw.domain.tld dpinger 40388 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 13957us RTTd: 14964us Loss: 22%)
<12>1 2023-05-15T13:28:19+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:20+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:21+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:22+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="6"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:23+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="7"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:24+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="8"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="9"] exiting on signal 15
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 89692 - [meta sequenceId="10"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 89692 - [meta sequenceId="11"] exiting on signal 15
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="12"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T13:37:36+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 532817us stddev 2881356us loss 0%
<13>1 2023-05-15T13:37:36+02:00 fw.domain.tld dpinger 38680 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 532817us RTTd: 2881356us Loss: 0%)
<12>1 2023-05-15T13:37:52+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: Clear latency 342244us stddev 2499381us loss 0%
<13>1 2023-05-15T13:37:52+02:00 fw.domain.tld dpinger 52473 - [meta sequenceId="4"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 0 RTT: 342244us RTTd: 2499381us Loss: 0%)
<12>1 2023-05-15T17:55:16+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 12785us stddev 13657us loss 22%
<13>1 2023-05-15T17:55:16+02:00 fw.domain.tld dpinger 24791 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 12785us RTTd: 13657us Loss: 22%)
<12>1 2023-05-15T17:58:09+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:10+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="2"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:11+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:12+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:12+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="5"] exiting on signal 15
<12>1 2023-05-15T17:58:13+02:00 fw.domain.tld dpinger 12299 - [meta sequenceId="6"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:58:13+02:00 fw.domain.tld dpinger 12299 - [meta sequenceId="7"] exiting on signal 15
<12>1 2023-05-15T17:58:13+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="8"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:58:49+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="9"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:50+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="10"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:51+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="11"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:52+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="12"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:53+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="13"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:54+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="14"] exiting on signal 15
<12>1 2023-05-15T17:58:54+02:00 fw.domain.tld dpinger 41836 - [meta sequenceId="15"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:58:55+02:00 fw.domain.tld dpinger 41836 - [meta sequenceId="16"] exiting on signal 15
<12>1 2023-05-15T17:58:55+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="17"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:59:31+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="18"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:32+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="19"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:33+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="20"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:34+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="21"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:35+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="22"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:36+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="23"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:36+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="24"] exiting on signal 15
<12>1 2023-05-15T17:59:36+02:00 fw.domain.tld dpinger 89372 - [meta sequenceId="25"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:59:37+02:00 fw.domain.tld dpinger 89372 - [meta sequenceId="26"] exiting on signal 15
<12>1 2023-05-15T17:59:37+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="27"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T18:45:50+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 15729us stddev 24430us loss 22%
<13>1 2023-05-15T18:45:50+02:00 fw.domain.tld dpinger 74095 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 15729us RTTd: 24430us Loss: 22%)
<12>1 2023-05-15T18:46:59+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:00+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="2"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:01+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:02+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:03+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:03+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="6"] exiting on signal 15
<12>1 2023-05-15T18:47:03+02:00 fw.domain.tld dpinger 8158 - [meta sequenceId="7"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T18:47:04+02:00 fw.domain.tld dpinger 8158 - [meta sequenceId="8"] exiting on signal 15
<12>1 2023-05-15T18:47:04+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="9"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:01:52+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 16162us stddev 15845us loss 22%
<13>1 2023-05-15T19:01:52+02:00 fw.domain.tld dpinger 62247 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 16162us RTTd: 15845us Loss: 22%)
<12>1 2023-05-15T19:03:00+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:01+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="2"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:02+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:03+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:04+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:05+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="6"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:05+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="7"] exiting on signal 15
<12>1 2023-05-15T19:03:05+02:00 fw.domain.tld dpinger 69283 - [meta sequenceId="8"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:03:06+02:00 fw.domain.tld dpinger 69283 - [meta sequenceId="9"] exiting on signal 15
<12>1 2023-05-15T19:03:06+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="10"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:03:10+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="11"] WAN_DHCP 8.8.8.8: Alarm latency 17525us stddev 5156us loss 33%
<13>1 2023-05-15T19:03:10+02:00 fw.domain.tld dpinger 90669 - [meta sequenceId="12"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 17525us RTTd: 5156us Loss: 33%)
<12>1 2023-05-15T19:03:50+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="13"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:51+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="14"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:52+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="15"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:53+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="16"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:54+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="17"] exiting on signal 15
<12>1 2023-05-15T19:03:54+02:00 fw.domain.tld dpinger 89699 - [meta sequenceId="18"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:03:55+02:00 fw.domain.tld dpinger 89699 - [meta sequenceId="19"] exiting on signal 15
<12>1 2023-05-15T19:03:55+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="20"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:21:16+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 12331us stddev 4992us loss 22%
<13>1 2023-05-15T19:21:16+02:00 fw.domain.tld dpinger 13437 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 12331us RTTd: 4992us Loss: 22%)
<12>1 2023-05-15T19:21:49+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:50+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:51+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:52+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="6"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:53+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="7"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:54+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="8"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:55+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="9"] exiting on signal 15<12>1 2023-05-15T13:16:37+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="13"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T13:27:54+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 13957us stddev 14964us loss 22%
<13>1 2023-05-15T13:27:54+02:00 fw.domain.tld dpinger 40388 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 13957us RTTd: 14964us Loss: 22%)
<12>1 2023-05-15T13:28:19+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:20+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:21+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:22+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="6"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:23+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="7"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:24+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="8"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 98753 - [meta sequenceId="9"] exiting on signal 15
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 89692 - [meta sequenceId="10"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 89692 - [meta sequenceId="11"] exiting on signal 15
<12>1 2023-05-15T13:28:25+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="12"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T13:37:36+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 532817us stddev 2881356us loss 0%
<13>1 2023-05-15T13:37:36+02:00 fw.domain.tld dpinger 38680 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 532817us RTTd: 2881356us Loss: 0%)
<12>1 2023-05-15T13:37:52+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: Clear latency 342244us stddev 2499381us loss 0%
<13>1 2023-05-15T13:37:52+02:00 fw.domain.tld dpinger 52473 - [meta sequenceId="4"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 0 RTT: 342244us RTTd: 2499381us Loss: 0%)
<12>1 2023-05-15T17:55:16+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 12785us stddev 13657us loss 22%
<13>1 2023-05-15T17:55:16+02:00 fw.domain.tld dpinger 24791 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 12785us RTTd: 13657us Loss: 22%)
<12>1 2023-05-15T17:58:09+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:10+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="2"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:11+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:12+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:12+02:00 fw.domain.tld dpinger 93726 - [meta sequenceId="5"] exiting on signal 15
<12>1 2023-05-15T17:58:13+02:00 fw.domain.tld dpinger 12299 - [meta sequenceId="6"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:58:13+02:00 fw.domain.tld dpinger 12299 - [meta sequenceId="7"] exiting on signal 15
<12>1 2023-05-15T17:58:13+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="8"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:58:49+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="9"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:50+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="10"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:51+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="11"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:52+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="12"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:53+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="13"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:58:54+02:00 fw.domain.tld dpinger 16931 - [meta sequenceId="14"] exiting on signal 15
<12>1 2023-05-15T17:58:54+02:00 fw.domain.tld dpinger 41836 - [meta sequenceId="15"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:58:55+02:00 fw.domain.tld dpinger 41836 - [meta sequenceId="16"] exiting on signal 15
<12>1 2023-05-15T17:58:55+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="17"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:59:31+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="18"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:32+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="19"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:33+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="20"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:34+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="21"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:35+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="22"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:36+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="23"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T17:59:36+02:00 fw.domain.tld dpinger 46565 - [meta sequenceId="24"] exiting on signal 15
<12>1 2023-05-15T17:59:36+02:00 fw.domain.tld dpinger 89372 - [meta sequenceId="25"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T17:59:37+02:00 fw.domain.tld dpinger 89372 - [meta sequenceId="26"] exiting on signal 15
<12>1 2023-05-15T17:59:37+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="27"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T18:45:50+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 15729us stddev 24430us loss 22%
<13>1 2023-05-15T18:45:50+02:00 fw.domain.tld dpinger 74095 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 15729us RTTd: 24430us Loss: 22%)
<12>1 2023-05-15T18:46:59+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:00+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="2"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:01+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:02+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:03+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T18:47:03+02:00 fw.domain.tld dpinger 93279 - [meta sequenceId="6"] exiting on signal 15
<12>1 2023-05-15T18:47:03+02:00 fw.domain.tld dpinger 8158 - [meta sequenceId="7"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T18:47:04+02:00 fw.domain.tld dpinger 8158 - [meta sequenceId="8"] exiting on signal 15
<12>1 2023-05-15T18:47:04+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="9"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:01:52+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 16162us stddev 15845us loss 22%
<13>1 2023-05-15T19:01:52+02:00 fw.domain.tld dpinger 62247 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 16162us RTTd: 15845us Loss: 22%)
<12>1 2023-05-15T19:03:00+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:01+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="2"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:02+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:03+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:04+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:05+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="6"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:05+02:00 fw.domain.tld dpinger 15652 - [meta sequenceId="7"] exiting on signal 15
<12>1 2023-05-15T19:03:05+02:00 fw.domain.tld dpinger 69283 - [meta sequenceId="8"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:03:06+02:00 fw.domain.tld dpinger 69283 - [meta sequenceId="9"] exiting on signal 15
<12>1 2023-05-15T19:03:06+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="10"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:03:10+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="11"] WAN_DHCP 8.8.8.8: Alarm latency 17525us stddev 5156us loss 33%
<13>1 2023-05-15T19:03:10+02:00 fw.domain.tld dpinger 90669 - [meta sequenceId="12"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 17525us RTTd: 5156us Loss: 33%)
<12>1 2023-05-15T19:03:50+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="13"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:51+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="14"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:52+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="15"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:53+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="16"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:03:54+02:00 fw.domain.tld dpinger 73146 - [meta sequenceId="17"] exiting on signal 15
<12>1 2023-05-15T19:03:54+02:00 fw.domain.tld dpinger 89699 - [meta sequenceId="18"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:03:55+02:00 fw.domain.tld dpinger 89699 - [meta sequenceId="19"] exiting on signal 15
<12>1 2023-05-15T19:03:55+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="20"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
<12>1 2023-05-15T19:21:16+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="1"] WAN_DHCP 8.8.8.8: Alarm latency 12331us stddev 4992us loss 22%
<13>1 2023-05-15T19:21:16+02:00 fw.domain.tld dpinger 13437 - [meta sequenceId="2"] GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 12331us RTTd: 4992us Loss: 22%)
<12>1 2023-05-15T19:21:49+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="3"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:50+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="4"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:51+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="5"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:52+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="6"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:53+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="7"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:54+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="8"] WAN_DHCP 8.8.8.8: sendto error: 65
<12>1 2023-05-15T19:21:55+02:00 fw.domain.tld dpinger 93153 - [meta sequenceId="9"] exiting on signal 15


Any ideas?


I even tried to downgrade my OPNsense to desired version following Shoterboyx's advice but I get errors when downgrading.

Hi!
I have no ideas about dping, but did you try change monitoring IP to something else like gateway or remove it completely? Also you can try to increase default probe interval.

dpinger is a service that does monitoring.

Disabling it makes no sense. Especially when you want to automate things.
Pinging gateway's IP address also makes no sense cause you don't test its own functionality.

The problem is not dpinger. Dpinger is a tool that helps you see that your GW goes down. The problem lays somewhere else.

May 16, 2023, 10:01:28 AM #3 Last Edit: May 16, 2023, 10:41:32 AM by Seimus
Hello,

Firstly what is your setup? Do you have Dual WAN?

Because from the logs you post you monitor your GW via dpinger but you are actually probing google DNS. My question here is why? Why not probing the GW of your ISP?

When you look at the logs and stats of dpinger either in logs or in Health graphs >

send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr xxx.xxx.14.160  identifier "WAN_DHCP "
WAN_DHCP 8.8.8.8: Alarm latency 13957us stddev 14964us loss 22%
GATEWAY ALARM: WAN_DHCP (Addr: 8.8.8.8 Alarm: 1 RTT: 13957us RTTd: 14964us Loss: 22%)


As you said.

Quotedpinger is a service that does monitoring

Dpinger flags your GW down cause you have 22% packet + RTT and RTTd towards 8.8.8.8. Either you have packet loss somewhere on the path to google DNS or on your WAN interface. Or you are loosing the route towards Google somewhere.

You need to do here some Tshooting:
1. Check your WAN interface for errors or problems
2. Check your Cable from your WAN to Telco device
3. When dpinger flags your Destination down (Google DNS) perform pings to compare results of dpinger
4. Try to change your destination target, as it could be that for some reason there is a packet loss on the path

Also the error you are getting is important:


sendto error: 65

Quote65 EHOSTUNREACH
No route to host.
A socket operation was attempted to an unreachable host.

Either there is no possible route to the target locally, or status information was received from an upstream router that indicated the same condition elsewhere along the path to the target.

This can happen due to a lack of default route, missing interface link route, or similar conditions.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Quote from: seki on May 16, 2023, 12:22:25 AM
The problem is not dpinger. Dpinger is a tool that helps you see that your GW goes down. The problem lays somewhere else.
That is why I asked you to try to play with address and interval.
Because for me (I stiil have no idea what is dpinger, since I never dig into it) first problem is with system design where some important automation relies on ping of some foreign single host somewhere in internet.

This is why I monitor the gateway ip and not some public ip.  By monitoring google dns you're adding in all kinds of additionally variables into the connection check.

Change dpinger to use the gateway and that should resolve your issue.

Quote from: Seimus on May 16, 2023, 10:01:28 AM
Hello,

Firstly what is your setup? Do you have Dual WAN?
Single.

Quote from: Seimus on May 16, 2023, 10:01:28 AM
Because from the logs you post you monitor your GW via dpinger but you are actually probing google DNS. My question here is why? Why not probing the GW of your ISP?
Why not DNS? What if my ISP GW sucks and drops me every few minutes and I need to prove this? I need to reach beyond the ISP's GW. I literally need to see why I get these Internet cutouts.

Quote from: Seimus on May 16, 2023, 10:01:28 AM
1. Check your WAN interface for errors or problems
2. Check your Cable from your WAN to Telco device
Tested multiple times. Don't think I can think of about any other test.

Quote from: Seimus on May 16, 2023, 10:01:28 AM

sendto error: 65

Quote65 EHOSTUNREACH
No route to host.
A socket operation was attempted to an unreachable host.

Either there is no possible route to the target locally, or status information was received from an upstream router that indicated the same condition elsewhere along the path to the target.

This can happen due to a lack of default route, missing interface link route, or similar conditions.

And here lays the problem. After I flap my WAN iface I do get this route working. Why suddenly route is being lost?

I literally have no idea why this is happening. How can prove my ISP that they eF-ed up something? OPNsense boots up, gets the IP from their DHCP (ISP modem is in bridge mode) and it works. Until... It stops. Then I need to flap/reboot in order to get it work.
This is exactly why I rely on Google DNS instead of monitoring their GW.

Than you for your extensive input and reply, Seimus nonetheless. You just gave me few other ideas to test things out.

Quote from: CJRoss on May 16, 2023, 03:01:09 PM
This is why I monitor the gateway ip and not some public ip.  By monitoring google dns you're adding in all kinds of additionally variables into the connection check.

Change dpinger to use the gateway and that should resolve your issue.

I'll run in this mode for a few hours/days.


One followup question though:
Once dpinger marks GW as down - how to mark it as online again besides flapping the WAN iface? I know that restoring connectivity is the proper way. I'm just asking for like a forced/ad-hoc restart and "tell" dpinger to stop treating my GW as down and perform the tests again. And I'm looking for a solution from CLI, not from OPNsense GUI.

Quote from: seki on May 17, 2023, 11:12:16 PM
Why not DNS? What if my ISP GW sucks and drops me every few minutes and I need to prove this? I need to reach beyond the ISP's GW. I literally need to see why I get these Internet cutouts.

I just spent almost a year fighting with my ISP to troubleshoot the last mile issues that were happening to me.  I find that there are rarely connection issues once past the gateway and the majority of issues occur in that section.  Additionally, you'll find that other issues, such as traffic congestion, etc, can result in false positives, resulting in the gateway being marked as down.

Quote from: seki on May 17, 2023, 11:12:16 PM
Quote from: Seimus on May 16, 2023, 10:01:28 AM
1. Check your WAN interface for errors or problems
2. Check your Cable from your WAN to Telco device
Tested multiple times. Don't think I can think of about any other test.

What kind of connection are you on?  Cable is a shared line and equipment problems in other buildings can cause your connection to drop due to noise on the line.


Quote from: seki on May 17, 2023, 11:18:14 PM
I literally have no idea why this is happening. How can prove my ISP that they eF-ed up something? OPNsense boots up, gets the IP from their DHCP (ISP modem is in bridge mode) and it works. Until... It stops. Then I need to flap/reboot in order to get it work.
This is exactly why I rely on Google DNS instead of monitoring their GW.

By testing google dns you're testing a whole lot more than just your ISP.  You should start with the shortest and simplest route and confirm that there are no problems with each step before moving on to the next.  What do your modem logs show?

Quote from: seki on May 17, 2023, 11:18:14 PM
One followup question though:
Once dpinger marks GW as down - how to mark it as online again besides flapping the WAN iface? I know that restoring connectivity is the proper way. I'm just asking for like a forced/ad-hoc restart and "tell" dpinger to stop treating my GW as down and perform the tests again. And I'm looking for a solution from CLI, not from OPNsense GUI.

No idea.  I've never had to try anything like that.  Keep in mind that the connection loss is a rolling average.  Which means that when your connection comes back up, the gateway won't be marked as online for a little while.  If you want the dashboard widget, you'll slowly see the percentage go down and it change from offline to packet loss and finally to online.

May 18, 2023, 11:00:15 AM #9 Last Edit: May 18, 2023, 12:18:40 PM by Seimus
Quote from: CJRoss on May 18, 2023, 01:26:07 AM
Quote from: seki on May 17, 2023, 11:12:16 PM
Why not DNS? What if my ISP GW sucks and drops me every few minutes and I need to prove this? I need to reach beyond the ISP's GW. I literally need to see why I get these Internet cutouts.

I just spent almost a year fighting with my ISP to troubleshoot the last mile issues that were happening to me.  I find that there are rarely connection issues once past the gateway and the majority of issues occur in that section.  Additionally, you'll find that other issues, such as traffic congestion, etc, can result in false positives, resulting in the gateway being marked as down.

Exactly as CJ said, by pinging a remote Destination that is outside you control, you add complexity to the path. Dont get me wrong, there are instances where your probing gives sense but I would say in your case it does not.

Your ISPs GW is the 1st HOP in the PATH, it should (and must be) be the most stable and accurate.

I have at my home a monitoring system that tracks all HOPs thru my provider to several remote destinations. I totaly track 7 Nodes of the provider and 2 remote destination Google and DuckDuckGo.

In case Google goes down but ISP Nodes are OK > this means Google has an issues
In case HOP1 will go down meaning ISP GW > I can see all HOPs behind it as well go down > ISP Last mile has a problem.

Basically Tshooting issues like this is like a domino. If the piece that is closes to you falls it will take down all pieces behind. Connectivity will not restore until you will not put that piece back or replace it with the next best piece > We call this re-convergence.

Quote from: CJRoss on May 18, 2023, 01:26:07 AM
Quote from: seki on May 17, 2023, 11:12:16 PM
Quote from: Seimus on May 16, 2023, 10:01:28 AM
1. Check your WAN interface for errors or problems
2. Check your Cable from your WAN to Telco device
Tested multiple times. Don't think I can think of about any other test.

What kind of connection are you on?  Cable is a shared line and equipment problems in other buildings can cause your connection to drop due to noise on the line.

It would be nice to see here as well points 3. 4. I mentioned if possible at all.

Quote from: CJRoss on May 18, 2023, 01:26:07 AM
Quote from: seki on May 17, 2023, 11:18:14 PM
I literally have no idea why this is happening. How can prove my ISP that they eF-ed up something? OPNsense boots up, gets the IP from their DHCP (ISP modem is in bridge mode) and it works. Until... It stops. Then I need to flap/reboot in order to get it work.
This is exactly why I rely on Google DNS instead of monitoring their GW.

By testing google dns you're testing a whole lot more than just your ISP.  You should start with the shortest and simplest route and confirm that there are no problems with each step before moving on to the next.  What do your modem logs show?

Again as mentioned above, Tracking remote destination out of your control adds thru internet adds complexity. In order to Tshoot what is last mile for you you need to review as well the ISP modes. Also don't forget by rebooting or bouncing the Interface on OPNsense you effectively bounce as well the port on the next connected device which most probably could be your ISP modem. So there is really a possibility the issue could be due to your ISP.

Quote from: CJRoss on May 18, 2023, 01:26:07 AM
Quote from: seki on May 17, 2023, 11:18:14 PM
One followup question though:
Once dpinger marks GW as down - how to mark it as online again besides flapping the WAN iface? I know that restoring connectivity is the proper way. I'm just asking for like a forced/ad-hoc restart and "tell" dpinger to stop treating my GW as down and perform the tests again. And I'm looking for a solution from CLI, not from OPNsense GUI.

No idea.  I've never had to try anything like that.  Keep in mind that the connection loss is a rolling average.  Which means that when your connection comes back up, the gateway won't be marked as online for a little while.  If you want the dashboard widget, you'll slowly see the percentage go down and it change from offline to packet loss and finally to online.


Quote from: seki on May 17, 2023, 11:12:16 PM
Quote from: Seimus on May 16, 2023, 10:01:28 AM

sendto error: 65

Quote65 EHOSTUNREACH
No route to host.
A socket operation was attempted to an unreachable host.

Either there is no possible route to the target locally, or status information was received from an upstream router that indicated the same condition elsewhere along the path to the target.

This can happen due to a lack of default route, missing interface link route, or similar conditions.

And here lays the problem. After I flap my WAN iface I do get this route working. Why suddenly route is being lost?

I literally have no idea why this is happening. How can prove my ISP that they eF-ed up something? OPNsense boots up, gets the IP from their DHCP (ISP modem is in bridge mode) and it works. Until... It stops. Then I need to flap/reboot in order to get it work.
This is exactly why I rely on Google DNS instead of monitoring their GW.

Than you for your extensive input and reply, Seimus nonetheless. You just gave me few other ideas to test things out.

dpinger as the name suggest sent ICMP probes by default with the lowest possible packet size. Once the packet loss/RTT/RTTd normalize to the thresholds set > aka the probes return under the set thresholds it flags the GW UP again.

The route gets lost cause you are learning it automatically. When you learn your IP via DHCP, you learn as well the GW. OPNsense than by default adds a default route 0.0.0.0 pointing to the Internet/NEXT HOP your ISP GW. When dpinger sees your destination down it declares the GW down thus removing effectively any routes in the RIB. Thus you have the error 65.

Here I have a suggestion. Because as you are having Single WAN setup, the moment you have the GW flagged as down you loose Internet connectivity. You can add a static route 0.0.0.0/0 pointing to the ISP GWwith higher AD than the learned one. What will happen in this case the moment dpinger flags the destination down, shuts your GW and learned route will be removed - static route with higher AD will take precedence and you still can forward the traffic. This would let you effectively test point 3. I mentioned.

Also before and if you try the above mentioned once dpinger flags your probed destination down check your routing table.


Another point is you say DHCP is still working. How did you confirm that did you see OPNsense sent request to refresh the lease of the current IP it had leased? Also what is the lease time of DHCP from your ISP? Because if for example your dpinger goes down but you have lease time 12h, well than that doesn't mean DHCP is working.

Also the fact if DHCP is working but dpinger can not reach google DNS could confirm as well itself the issue is somewhere beyond your LAN that causes you theses drops.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Not sure if it is helpful or stating the obvious:
I've made good experience with using smokeping with different targets and probes to analyze network connection issues. You could add a smokeping instance targeting your ISP, google, ... to double check if something changes before the issue occurs and which target is still reachable.
Simon

Thank you Guys for the extensive knowledge I gained here.

I will definitely test it out. As a matter of fact I've removed 8's from my dpinger and now by default it is pinging my WAN_GW.

When the problem occurred again I've noticed something interesting.

<12>1 2023-05-19T02:28:33+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="119"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:34+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="120"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:35+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="121"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:36+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="122"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:37+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="123"] WAN_DHCP xxx.xxx.12.1: sendto error: 65
<12>1 2023-05-19T02:28:38+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="124"] WAN_DHCP xxx.xxx.12.1: sendto error: 65
<12>1 2023-05-19T02:28:39+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="125"] WAN_DHCP xxx.xxx.12.1: sendto error: 65


When my GW goes down at fist I get error 64 which means:
Quote64 EHOSTDOWN
Host is down.
A socket operation failed because the destination host was down.

And after a while E65 kicks in.

Which to my simple logic - ISPs CMTS (my bridge modem is a DOCSISv3 one) probably goes down and then when it gets up it tries to converge the network protocols again but it is responding. Hence why it's "EHOSTDOWN" first cause it's literally down, then it gets up but not ready yet and then it's "EHOSTUNREACH". Am I connecting the dots in the right way?

Quote from: seki on May 19, 2023, 12:43:06 PM
Thank you Guys for the extensive knowledge I gained here.

I will definitely test it out. As a matter of fact I've removed 8's from my dpinger and now by default it is pinging my WAN_GW.

When the problem occurred again I've noticed something interesting.

<12>1 2023-05-19T02:28:33+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="119"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:34+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="120"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:35+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="121"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:36+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="122"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:37+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="123"] WAN_DHCP xxx.xxx.12.1: sendto error: 65
<12>1 2023-05-19T02:28:38+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="124"] WAN_DHCP xxx.xxx.12.1: sendto error: 65
<12>1 2023-05-19T02:28:39+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="125"] WAN_DHCP xxx.xxx.12.1: sendto error: 65


When my GW goes down at fist I get error 64 which means:
Quote64 EHOSTDOWN
Host is down.
A socket operation failed because the destination host was down.

And after a while E65 kicks in.

Which to my simple logic - ISPs CMTS (my bridge modem is a DOCSISv3 one) probably goes down and then when it gets up it tries to converge the network protocols again but it is responding. Hence why it's "EHOSTDOWN" first cause it's literally down, then it gets up but not ready yet and then it's "EHOSTUNREACH". Am I connecting the dots in the right way?

Interesting.  I've not looked that closely at the dpinger results.  Did you manually set your GW or did you let dpinger automatically get it?

One thing I ran into is that when my connection went down my modem would assign me a local IP and GW, so dpinger would never realize that my connection was down.  I had to add my modem's IP into the dhcp rejection field for the interface.

Whenever you run into connection issues, make sure to take a look at your modem's status info and logs.  Those can help you figure out what's going on with the connection.

Quote from: CJRoss on May 19, 2023, 03:03:18 PM

Interesting.  I've not looked that closely at the dpinger results.  Did you manually set your GW or did you let dpinger automatically get it?

GW is set to DHCP and that's it. Nothing else:



Other than that - here's what I had configured before Seimus and other guys suggested to remove (if there's no address then GW's IP is pinged by default):




Quote from: seki on May 19, 2023, 12:43:06 PM
Thank you Guys for the extensive knowledge I gained here.

I will definitely test it out. As a matter of fact I've removed 8's from my dpinger and now by default it is pinging my WAN_GW.

When the problem occurred again I've noticed something interesting.

<12>1 2023-05-19T02:28:33+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="119"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:34+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="120"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:35+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="121"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:36+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="122"] WAN_DHCP xxx.xxx.12.1: sendto error: 64
<12>1 2023-05-19T02:28:37+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="123"] WAN_DHCP xxx.xxx.12.1: sendto error: 65
<12>1 2023-05-19T02:28:38+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="124"] WAN_DHCP xxx.xxx.12.1: sendto error: 65
<12>1 2023-05-19T02:28:39+02:00 fw.domain.it dpinger 28711 - [meta sequenceId="125"] WAN_DHCP xxx.xxx.12.1: sendto error: 65


When my GW goes down at fist I get error 64 which means:
Quote64 EHOSTDOWN
Host is down.
A socket operation failed because the destination host was down.

And after a while E65 kicks in.

Which to my simple logic - ISPs CMTS (my bridge modem is a DOCSISv3 one) probably goes down and then when it gets up it tries to converge the network protocols again but it is responding. Hence why it's "EHOSTDOWN" first cause it's literally down, then it gets up but not ready yet and then it's "EHOSTUNREACH". Am I connecting the dots in the right way?

I think the error 64 and 65 work together in order. Error 64 declares the targeted host down. Error 65 than because the HOST is down appears because the defaulte route got removed as your GW was taken down.

Also there is the question does the modem really recovers? Because it can go into a stucked state, this could count even for a  single interface that interface that is connecting towards the OPN.

When this happens can you check your ARP table?
Do you see the IPS GW in the ARP entry?
Does OPN learns the ARP about the ISP GW?


Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD