When I have gateway monitoring on, i get random alerts that an interface is down. However, it is not actually down (from what I can tell). What can I investigate to get to the bottom of this?
Can you post the logs of gateways?
Sure, here you go.
Did those logs help? Any thoughts here?
I wonder if you may be affected by the same issue I had where some static routes were added by dhcp client when I was getting my WAN IPs and overriding the routes and gateway monitoring I had.
You may want to try this, it fixed it for me (and will make its way to the next 22.1.x release anyways)
https://forum.opnsense.org/index.php?topic=26765.msg129675#msg129675
You will have to reboot after the patch so that your routing tables clears up properly or re-save your WANPUBLIC and WIRELESS interfaces so they re-update themselves but reboot makes sure everything starts clean after the patch.
Also, send us your routing table (while the problem is there) just for the fun of it (Just the IPv4 (top) part, not everything).
Thank you, I will dig through that post shortly.
RE: Sending the routing table is that just the Status under Routes in the GUI or something else?
Yes, only the first few 5-6 lines should be good.
My problem showed up at lines 2 and 3 where my DNS servers, that I use for gateway monitoring (like you seem to be doing), appeared routed to the wrong place instead of being assigned to the right gateways as it used to before 22.1.
Franco created a quick fix that will make it way to the next 22.1 release that fixes this and brings back the pre-22.1 behaviour (which was the right way).
You can run this, in command line, in your pfsense to add this patch if you want to have it before the next release.
# opnsense-patch 02dc1ebd93
And then reboot. My problem was gone and I could then see the right routes to the DNS servers and gateway monitoring IPs.
Thank you.
I have applied the patch to one of my nodes and I was going to compare the data once I see the alert come through. Right now they look the same except for one item (which I expect because I have one interface configured as a pass through but can only do that on a single unit at a time).
Make sure to reboot or re-save any interface that is configured in DHCP (usually WAN interfaces) after the patch or else nothing will change. The affected script is only called when an interface comes up/gets a DHCP ip.
I rebooted after patching.
I just got an alert but the tables look the same. Wan12 is the one I was alerted on.
No Patch
ipv4 default 192.168.12.1 UGS NaN 1500 hn4 Wireless2
ipv4 1.0.0.1 192.168.252.254 UGHS NaN 1492 hn2 Wan12
ipv4 1.1.1.1 192.168.250.1 UGHS NaN 1500 hn3 Wireless1
ipv4 8.8.4.4 192.168.12.1 UGHS NaN 1500 hn4 Wireless2
ipv4 8.8.8.8 192.168.254.254 UGHS NaN 1492 hn1 Wan18
Patch
ipv4 default 192.168.12.1 UGS NaN 1500 hn4 Wireless2
ipv4 1.0.0.1 192.168.252.254 UGHS NaN 1492 hn2 Wan12
ipv4 1.1.1.1 192.168.250.1 UGHS NaN 1500 hn3 Wireless1
ipv4 8.8.4.4 192.168.12.1 UGHS NaN 1500 hn4 Wireless2
ipv4 8.8.8.8 192.168.254.254 UGHS NaN 1492 hn1 Wan18
The patched node is continuing to experience this issue. Any other suggestions?
If you try to ping the monitoring IP of the gateway yourself from the router itself (while opnsense says it is down) does it ping ?
ie: let's say you have gateway Wan12 that uses 1.0.0.1 as the monitoring IP and opnsense says that it is currently down, can you ping 1.0.0.1 from opnsense itself if you run ping 1.0.0.1 in command line in opnsense ? Also make sure to ping the gateway itself to make sure it is also down and/or up.
Also, when it show as "down", is it because it sees high packet loss or 100% faillure or the interface down (in opnsense UI when you check the status) ?
Last thing, was the routes you sent above during a gateway down or while it was considered up ? If not during down, can you try to capture the routes when it says it is down ?
We're investigating here, I have no idea why this would be happening. I also have many public WANs and LANs and many have external monitoring IP as well and I'm not affected by this so I'm very curious as to what would trigger this behaviour.
I had a similar thing last year, the provider was rate limiting ping so I opened advanced section in gateway and set to only ping every 5 seconds
I just turned it back on and I got an email immediately that it failed. I pinged the gateway and the ip that I have in monitoring from the interface that is down, and the ping succeeds.
The alert that I get is that the gateway is down. The UI shows that everything is fine at this time.
The routes are the same regardless if it down or not.
I will try changing the time to 5 seconds to see if it makes a difference.
I just got another failure from the second node. It shows in the UI that the interface is offline.
Ping from the interface is successful and no changes to the routes.
I disabled monitoring and re-enabled it and it shows as online.
Firewall : Settings : Advanced : Disable State Kill on Gateway failure .. please tick this
That is not a valid option under Advanced.
Really?
These are my options
IPv6 Options full help
Allow IPv6 Allow IPv6
Network Address Translation
Reflection for port forwards
Reflection for 1:1
Automatic outbound NAT for Reflection
Bogon Networks
Update Frequency
Monthly
Gateway Monitoring
Skip rules Skip rules when gateway is down
Multi-WAN
Sticky connections Use sticky connections
Source tracking timeout
Shared forwarding Use shared forwarding between packet filter, traffic shaper and captive portal
Disable force gateway Disable automatic rules which force local services to use the assigned interface gateway.
Schedules
Schedule States
Miscellaneous
Firewall Optimization
normal
Firewall Rules Optimization
basic
Bind states to interface
Disable Firewall Disable all packet filtering.
Firewall Adaptive Timeouts
start end
Firewall Maximum States
Firewall Maximum Fragments
Firewall Maximum Table Entries
Static route filtering Bypass firewall rules for traffic on the same interface
Disable reply-to Disable reply-to on WAN rules
Disable anti-lockout Disable administration anti-lockout rule
Aliases Resolve Interval
Check certificate of aliases URLs Verify HTTPS certificates when downloading alias URLs
Dynamic state reset Reset all states when a dynamic IP address changes.
Ah, was removed with 22.1 so it seems OK then
Same issue here, ended up disabling monitoring till I can roll back to 21.7...
Any other recommendations on how to solve this issue? It seems to be constant with most of my interfaces.
Can you post the logs please?
Here you go
I seem to have similar issue https://forum.opnsense.org/index.php?topic=27433.0
I have two WANs, both doing gateway monitoring. One is having the problem. The other seems fine. When it happens, the interface that is affected shows double ICMP replies of the dpinger pings.
tcpdump -n -i cxl1 icmp
...
23:51:43.962528 IP 98.51.182.15 > 4.2.2.3: ICMP echo request, id 33608, seq 0, length 8
23:51:43.981125 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8
23:51:43.981169 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8
Quote from: darp12345 on March 11, 2022, 12:57:54 AM
I seem to have similar issue https://forum.opnsense.org/index.php?topic=27433.0
I have two WANs, both doing gateway monitoring. One is having the problem. The other seems fine. When it happens, the interface that is affected shows double ICMP replies of the dpinger pings.
tcpdump -n -i cxl1 icmp
...
23:51:43.962528 IP 98.51.182.15 > 4.2.2.3: ICMP echo request, id 33608, seq 0, length 8
23:51:43.981125 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8
23:51:43.981169 IP 4.2.2.3 > 98.51.182.15: ICMP echo reply, id 33608, seq 0, length 8
Is this a virtual machine? Usually its fine to only monitor the primary line as there is no action when both are down