WAN interface flapping with 22.1.2

Started by foxmanb, March 03, 2022, 01:45:18 PM

Previous topic - Next topic
Been messing with this on and off over the last week..  I stabilized my dual wan HA setup by removing the mac spoofing and hostname from the WAN interfaces on the primary firewall.  Once I did that, the flapping stopped completely for the primary.  08-setwanmac

Whenever I move to my backup, the backup's WAN interfaces would flap and my WANs would take turns going up and down.

Tried a bunch of different things along what you all have tried, then I thought about it and added a simple script to the following directory:

/usr/local/etc/rc.syshook.d/start/08-setwanmac

08-setwanmac contains this:

#!/bin/sh

# Change WAN MAC addresses
ifconfig igb4 ether yy:yy:yy:yy:yy:yx
ifconfig igb5 ether xx:xx:xx:xx:xx:xy

the 08-setwanmac is silly.. just using ifconfig to change the MAC to the desired MAC (a clone of my primary firewall NIC MAC addresses). 

Super static and simple, but it's survived quite a few reboots and forced swaps with minimal packet loss and zero flaps.   I just inserted the MAC change prior to the newwanip script, thinking that the mac change would occur before the newwanip script.   Working out so far.


For me the only fix is either disable gateway monitor (not really an option) or to change the gateway monitor from 8.8.8.8 to 8.8.4.4 then back again each time it goes "down"..

Well in any case you seem to have overlapping DNS servers for the different interfaces, either set manually, by ISP or gateway monitor. In some cases ISPs push Google servers which is pretty mean since it also pins a route for it through their interface.


Cheers,
Franco

@franco so the dns servers under settings-->general cannot overlap with gateway monitor IP's?  I use piholes for DNS and push the piholes ip's out via DHCP to all clients. 

Every one of those creates a host route if you select a gateway for it. If these host routes conflict with the use in the gateway monitoring (most of the time because at least one host route overlaps multiple interfaces or the whole config is reversed there) you get the gateway flapping when the wrong interface comes back as the monitor uses the wrong gateway to monitor another.


Cheers,
Franco

@franco I think this fixed the issue!  Thanks!  When you try to add two gateway monitors that overlap, the GUI alerts you and will not save it.  However it will allow you to add a gateway monitor with the same IP as a DNS server specified in settings-->General without a warning.  FWIW, my dns servers in settings->general were assigned to NONE

@franco after about a week, the exact problem has returned.. I have ensured there are no over-lapping dns entries.  Switching the monitor from 8.8.8.8 to 75.75.75.75 and the interface immediately returns to up.

2022-03-31T16:57:42-04:00 Warning dpinger send_interval 2000ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 55% dest_addr 75.75.75.75 bind_addr 173.9.169.97 identifier "WAN_Comcast_GWv4 "
2022-03-31T16:57:42-04:00 Warning dpinger exiting on signal 15
2022-03-31T16:57:42-04:00 Warning dpinger exiting on signal 15
2022-03-31T16:57:26-04:00 Notice dpinger GATEWAY ALARM: WAN_Comcast_GWv4 (Addr: 8.8.8.8 Alarm: 0 RTT: 20033us RTTd: 4707us Loss: 40%)
2022-03-31T16:57:26-04:00 Warning dpinger WAN_Comcast_GWv4 8.8.8.8: Clear latency 20033us stddev 4707us loss 40%
2022-03-31T16:54:13-04:00 Notice dpinger GATEWAY ALARM: WAN_Comcast_GWv4 (Addr: 8.8.8.8 Alarm: 1 RTT: 20596us RTTd: 1985us Loss: 58%)
2022-03-31T16:54:13-04:00 Warning dpinger WAN_Comcast_GWv4 8.8.8.8: Alarm latency 20596us stddev 1985us loss 58%
2022-03-31T16:53:40-04:00 Notice dpinger GATEWAY ALARM: WAN_Comcast_GWv4 (Addr: 8.8.8.8 Alarm: 0 RTT: 20997us RTTd: 1835us Loss: 41%)
2022-03-31T16:53:40-04:00 Warning dpinger WAN_Comcast_GWv4 8.8.8.8: Clear latency 20997us stddev 1835us loss 41%
2022-03-31T16:51:34-04:00 Notice dpinger GATEWAY ALARM: WAN_Comcast_GWv4 (Addr: 8.8.8.8 Alarm: 1 RTT: 29015us RTTd: 10285us Loss: 58%)
2022-03-31T16:51:34-04:00 Warning dpinger WAN_Comcast_GWv4 8.8.8.8: Alarm latency 29015us stddev 10285us loss 58%
2022-03-31T16:48:17-04:00 Notice dpinger GATEWAY ALARM: WAN_Comcast_GWv4 (Addr: 8.8.8.8 Alarm: 0 RTT: 40731us RTTd: 9156us Loss: 54%)
2022-03-31T16:48:17-04:00 Warning dpinger WAN_Comcast_GWv4 8.8.8.8: Clear latency 40731us stddev 9156us loss 54%
2022-03-31T16:47:35-04:00 Notice dpinger GATEWAY ALARM: WAN_Comcast_GWv4 (Addr: 8.8.8.8 Alarm: 1 RTT: 0us RTTd: 0us Loss: 100%)
2022-03-31T16:47:35-04:00 Warning dpinger WAN_Comcast_GWv4 8.8.8.8: Alarm latency 0us stddev 0us loss 100%
2022-03-31T16:47:33-04:00 Warning dpinger send_interval 15000ms loss_interval 4000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 60000ms latency_alarm 2500ms loss_alarm 80% dest_addr 1.1.1.1 bind_addr 192.168.42.135 identifier "WAN_HNETIPV4 "
2022-03-31T16:47:33-04:00 Warning dpinger send_interval 2000ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 55% dest_addr 8.8.8.8 bind_addr 173.9.169.97 identifier "WAN_Comcast_GWv4 "
2022-03-31T16:47:33-04:00 Warning dpinger exiting on signal 15
2022-03-31T16:47:33-04:00 Warning dpinger exiting on signal 15

Is it possible that Monit (setup to monitor an ipsec VPN with "failed ping4 count 10 address XXX") is causing the flapping issue?  Seems like disabling my monitors for multiple ipsec vpns restores stability to the gateway.

I can confirm I do not have monit running and did experience this issue.

Disabling Suricata last month (less than ideal) reduced the frequency of WAN drop but it's still happening on occasion. Is anyone else still seeing this behavior?

I have also experienced WAN flapping with v22.1.4. (All Intel NICs, if that's relevant)
Disabling MAC Spoofing and/or IPS did not resolve the issue, neither did a rollback to 22.1.1.
Finally I had to re-install 21.7 to reach stability again.

Hello,

It seems I'm experiencing similar issues with loss of WAN. I can't find relevant logs so far, everything seems ok according to web interface, and I need to ifconfig down && up to restore connectivity.
I have Intel NIC (I210), but don't use mac spoofing nor suricata nor monitoring.
I will try downgrading to 21.7 (using 22.1.5, and had the same issues with 22.1.4 and below).

Any ideas of logs I can check to investigate?

Regards

no luck whit 22.1.5 hope for a nice kernel update i guese...?

Hi Opnsense folks. Any progress on this one? Will it be addressed in a future release?

Thank you!

Quote from: foxmanb on April 12, 2022, 03:23:55 PM
Hi Opnsense folks. Any progress on this one? Will it be addressed in a future release?

Thank you!

+1  :)