OPNsense Forum

Archive => 22.1 Legacy Series => Topic started by: supercm on February 08, 2022, 01:24:02 AM

Title: Gateway Monitoring
Post by: supercm on February 08, 2022, 01:24:02 AM
When I have gateway monitoring on, i get random alerts that an interface is down. However, it is not actually down (from what I can tell). What can I investigate to get to the bottom of this?
Title: Re: Gateway Monitoring
Post by: mimugmail on February 08, 2022, 06:24:04 AM
Can you post the logs of gateways?
Title: Re: Gateway Monitoring
Post by: supercm on February 08, 2022, 06:54:01 PM
Sure, here you go.
Title: Re: Gateway Monitoring
Post by: supercm on February 10, 2022, 04:02:36 PM
Did those logs help? Any thoughts here?
Title: Re: Gateway Monitoring
Post by: RedVortex on February 10, 2022, 05:15:16 PM
I wonder if you may be affected by the same issue I had where some static routes were added by dhcp client when I was getting my WAN IPs and overriding the routes and gateway monitoring I had.

You may want to try this, it fixed it for me (and will make its way to the next 22.1.x release anyways)

https://forum.opnsense.org/index.php?topic=26765.msg129675#msg129675

You will have to reboot after the patch so that your routing tables clears up properly or re-save your WANPUBLIC and WIRELESS interfaces so they re-update themselves but reboot makes sure everything starts clean after the patch.

Also, send us your routing table (while the problem is there) just for the fun of it (Just the IPv4 (top) part, not everything).
Title: Re: Gateway Monitoring
Post by: supercm on February 10, 2022, 05:56:20 PM
Thank you, I will dig through that post shortly.

RE: Sending the routing table is that just the Status under Routes in the GUI or something else?
Title: Re: Gateway Monitoring
Post by: RedVortex on February 10, 2022, 06:13:47 PM
Yes, only the first few 5-6 lines should be good.

My problem showed up at lines 2 and 3 where my DNS servers, that I use for gateway monitoring (like you seem to be doing), appeared routed to the wrong place instead of being assigned to the right gateways as it used to before 22.1.

Franco created a quick fix that will make it way to the next 22.1 release that fixes this and brings back the pre-22.1 behaviour (which was the right way).

You can run this, in command line, in your pfsense to add this patch if you want to have it before the next release.

# opnsense-patch 02dc1ebd93

And then reboot. My problem was gone and I could then see the right routes to the DNS servers and gateway monitoring IPs.
Title: Re: Gateway Monitoring
Post by: supercm on February 10, 2022, 06:22:15 PM
Thank you.

I have applied the patch to one of my nodes and I was going to compare the data once I see the alert come through. Right now they look the same except for one item (which I expect because I have one interface configured as a pass through but can only do that on a single unit at a time).
Title: Re: Gateway Monitoring
Post by: RedVortex on February 10, 2022, 07:02:34 PM
Make sure to reboot or re-save any interface that is configured in DHCP (usually WAN interfaces) after the patch or else nothing will change. The affected script is only called when an interface comes up/gets a DHCP ip.
Title: Re: Gateway Monitoring
Post by: supercm on February 10, 2022, 08:04:46 PM
I rebooted after patching.
Title: Re: Gateway Monitoring
Post by: supercm on February 10, 2022, 08:11:41 PM
I just got an alert but the tables look the same. Wan12 is the one I was alerted on.

No Patch

ipv4   default   192.168.12.1   UGS   NaN   1500   hn4   Wireless2       
ipv4   1.0.0.1   192.168.252.254   UGHS   NaN   1492   hn2   Wan12       
ipv4   1.1.1.1   192.168.250.1   UGHS   NaN   1500   hn3   Wireless1       
ipv4   8.8.4.4   192.168.12.1   UGHS   NaN   1500   hn4   Wireless2       
ipv4   8.8.8.8   192.168.254.254   UGHS   NaN   1492   hn1   Wan18       

Patch
ipv4   default   192.168.12.1   UGS   NaN   1500   hn4   Wireless2       
ipv4   1.0.0.1   192.168.252.254   UGHS   NaN   1492   hn2   Wan12       
ipv4   1.1.1.1   192.168.250.1   UGHS   NaN   1500   hn3   Wireless1       
ipv4   8.8.4.4   192.168.12.1   UGHS   NaN   1500   hn4   Wireless2       
ipv4   8.8.8.8   192.168.254.254   UGHS   NaN   1492   hn1   Wan18       

Title: Re: Gateway Monitoring
Post by: supercm on February 15, 2022, 04:12:05 PM
The patched node is continuing to experience this issue. Any other suggestions?
Title: Re: Gateway Monitoring
Post by: RedVortex on February 15, 2022, 11:15:48 PM
If you try to ping the monitoring IP of the gateway yourself from the router itself (while opnsense says it is down) does it ping ?

ie: let's say you have gateway Wan12 that uses 1.0.0.1 as the monitoring IP and opnsense says that it is currently down, can you ping 1.0.0.1 from opnsense itself if you run ping 1.0.0.1 in command line in opnsense ? Also make sure to ping the gateway itself to make sure it is also down and/or up.

Also, when it show as "down", is it because it sees high packet loss or 100% faillure or the interface down (in opnsense UI when you check the status) ?

Last thing, was the routes you sent above during a gateway down or while it was considered up ? If not during down, can you try to capture the routes when it says it is down ?

We're investigating here, I have no idea why this would be happening. I also have many public WANs and LANs and many have external monitoring IP as well and I'm not affected by this so I'm very curious as to what would trigger this behaviour.
Title: Re: Gateway Monitoring
Post by: mimugmail on February 16, 2022, 06:21:44 AM
I had a similar thing last year, the provider was rate limiting ping so I opened advanced section in gateway and set to only ping every 5 seconds
Title: Re: Gateway Monitoring
Post by: supercm on February 16, 2022, 07:05:33 PM
I just turned it back on and I got an email immediately that it failed. I pinged the gateway and the ip that I have in monitoring from the interface that is down, and the ping succeeds.

The alert that I get is that the gateway is down. The UI shows that everything is fine at this time.

The routes are the same regardless if it down or not.
Title: Re: Gateway Monitoring
Post by: supercm on February 16, 2022, 07:06:52 PM
I will try changing the time to 5 seconds to see if it makes a difference.
Title: Re: Gateway Monitoring
Post by: supercm on February 16, 2022, 07:12:28 PM
I just got another failure from the second node. It shows in the UI that the interface is offline.

Ping from the interface is successful and no changes to the routes.

I disabled monitoring and re-enabled it and it shows as online.
Title: Re: Gateway Monitoring
Post by: mimugmail on February 16, 2022, 07:21:34 PM
Firewall : Settings : Advanced : Disable State Kill on Gateway failure .. please tick this
Title: Re: Gateway Monitoring
Post by: supercm on February 16, 2022, 07:36:34 PM
That is not a valid option under Advanced.
Title: Re: Gateway Monitoring
Post by: mimugmail on February 16, 2022, 07:37:09 PM
Really?
Title: Re: Gateway Monitoring
Post by: supercm on February 16, 2022, 07:51:31 PM
These are my options


IPv6 Options   full help
Allow IPv6    Allow IPv6
Network Address Translation   
Reflection for port forwards   
Reflection for 1:1   
Automatic outbound NAT for Reflection   
Bogon Networks   
Update Frequency   
Monthly
Gateway Monitoring   
Skip rules    Skip rules when gateway is down
Multi-WAN   
Sticky connections    Use sticky connections
Source tracking timeout
Shared forwarding    Use shared forwarding between packet filter, traffic shaper and captive portal
Disable force gateway    Disable automatic rules which force local services to use the assigned interface gateway.
Schedules   
Schedule States   
Miscellaneous   
Firewall Optimization   
normal
Firewall Rules Optimization   
basic
Bind states to interface   
Disable Firewall    Disable all packet filtering.
Firewall Adaptive Timeouts   
start   end
Firewall Maximum States   
Firewall Maximum Fragments   
Firewall Maximum Table Entries   
Static route filtering    Bypass firewall rules for traffic on the same interface
Disable reply-to    Disable reply-to on WAN rules
Disable anti-lockout    Disable administration anti-lockout rule
Aliases Resolve Interval   
Check certificate of aliases URLs    Verify HTTPS certificates when downloading alias URLs
Dynamic state reset    Reset all states when a dynamic IP address changes.
Title: Re: Gateway Monitoring
Post by: mimugmail on February 16, 2022, 09:44:36 PM
Ah, was removed with 22.1 so it seems OK then
Title: Re: Gateway Monitoring
Post by: jclendineng on February 17, 2022, 02:59:44 PM
Same issue here, ended up disabling monitoring till I can roll back to 21.7...
Title: Re: Gateway Monitoring
Post by: supercm on February 25, 2022, 08:10:34 PM
Any other recommendations on how to solve this issue? It seems to be constant with most of my interfaces.
Title: Re: Gateway Monitoring
Post by: mimugmail on February 25, 2022, 08:14:20 PM
Can you post the logs please?
Title: Re: Gateway Monitoring
Post by: supercm on March 09, 2022, 07:35:19 PM
Here you go
Title: Re: Gateway Monitoring
Post by: darp12345 on March 11, 2022, 12:57:54 AM
I seem to have similar issue https://forum.opnsense.org/index.php?topic=27433.0
I have two WANs, both doing gateway monitoring. One is having the problem. The other seems fine. When it happens, the interface that is affected shows double ICMP replies of the dpinger pings.
tcpdump -n -i cxl1 icmp
...
23:51:43.962528 IP 98.51.182.15 > 4.2.2.3: ICMP echo request, id 33608, seq 0, length 8
23:51:43.981125 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8
23:51:43.981169 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8

Title: Re: Gateway Monitoring
Post by: mimugmail on March 11, 2022, 06:34:18 AM
Quote from: darp12345 on March 11, 2022, 12:57:54 AM
I seem to have similar issue https://forum.opnsense.org/index.php?topic=27433.0
I have two WANs, both doing gateway monitoring. One is having the problem. The other seems fine. When it happens, the interface that is affected shows double ICMP replies of the dpinger pings.
tcpdump -n -i cxl1 icmp
...
23:51:43.962528 IP 98.51.182.15 > 4.2.2.3: ICMP echo request, id 33608, seq 0, length 8
23:51:43.981125 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8
23:51:43.981169 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8


Is this a virtual machine? Usually its fine to only monitor the primary line as there is no action when both are down