WAN gateway not going back up after Internet outage

doktornotor · August 27, 2024, 06:47:14 PM

Quote
But then I had a thought and tested it by unplugging the coax cable on the primary WAN instead of the NIC, The failover to backup worked fine but when the coax was reconnected there was presumably no reconnect event and the primary was not detected up again.

I am not exactly hopeful here. apinger, dpinger, nonpinger... all suck. This stuck condition is something I've observed for ages on one particular site using "the other project's sense". That site is so inconveniently located that it was "solved" by using a IP watchdog socket. It will eventually power-cycle the ISP gear - which will trigger link down / up - enough to respawn the dpinger zombie back to life.

franco · August 27, 2024, 07:03:30 PM

What if we offer a lightweight way to trigger the "fix stuck down monitors" code introduced here via cron job? Just the monitor part:

https://github.com/opnsense/core/commit/0c9d8c94049

doktornotor · August 27, 2024, 07:17:28 PM

Ok, so that's sort of go back to "unconditionally restarted all the time"? :D

franco · August 27, 2024, 09:06:41 PM

Fair point, although it's not randomly restarting due to monitor events. It's scheduled restarting via user wishes. And we don't have to guess what schedule the user prefers.

Over the years users seem to have grown fond of cron-based workarounds.

Cheers,
Franco

doktornotor · August 27, 2024, 09:10:26 PM

Well I don't have anything against that (will just cause some additional log noise). Better than having it in perpetually stuck state.

franco · August 28, 2024, 11:54:00 AM

I made an error with the previous patch. Here is the revised version with cron job and all:

https://github.com/opnsense/core/issues/7027#issuecomment-2314857927

Only use the cron job if the issue persists with this patch applied over the next days.

Cheers,
Franco

kevindd992002 · October 29, 2024, 02:54:38 AM

I had this happen again today. Internet connection went down overnight because of maintenance. I woke up to no Internet. I had to go gateways, edit the gateway and save.

Is there any update to this? Is this happening to pfsense too?

franco · October 29, 2024, 08:35:38 AM

The patch in question was added to 24.7.x already. You can add the "Manual gateway switch" cron job to adjust the situation.

I don't know about the other *sense. You trade a bug for another either way I think, but whatever works works.

Cheers,
Franco

kevindd992002 · October 29, 2024, 02:01:18 PM

Quote from: franco on October 29, 2024, 08:35:38 AM
The patch in question was added to 24.7.x already. You can add the "Manual gateway switch" cron job to adjust the situation.

I don't know about the other *sense. You trade a bug for another either way I think, but whatever works works.

Cheers,
Franco

I'm running 24.7.4_1 when this happened yesterday. Is the "manual gateway switch" created mainly as a workaround for this issue?

franco · November 08, 2024, 08:04:31 AM

Someone may have taken the time to find out what the actual issue is:

https://github.com/opnsense/core/issues/7635#issuecomment-2462066123

leading to

https://github.com/opnsense/core/issues/7027#issuecomment-2462108325

Cheers,
Franco

FredFresh · April 18, 2025, 07:30:50 AM

Hi,

I am correctly understanding thet the road map of this https://github.com/opnsense/core/issues/7027#issuecomment-2462108325 should lead to a solution with the Opnsense rev 25.7 ? Thank you.

drewhemm · May 19, 2025, 09:45:52 PM

Eagerly awaiting 25.7!

Is there a workaround in the meantime, aside from patching or manually restarting the gateway monitor? Is it better for now to disable gateway monitoring?

I have three WAN connections, two fibre and one Starlink. The latter is a last resort backup to prevent complete outages. I have had a couple of times where the monitor observes/thinks some of these WAN connections are down and they don't recover, even though they have.

drewhemm · May 19, 2025, 09:53:48 PM

I enabled the 'manual gateway switch' cronjob and set it for every five minutes. It has successfully restored a WAN connection that was incorrectly marked as offline.

franco · May 20, 2025, 08:03:26 AM

Try the new failover/failback options as described in the documentation. https://github.com/opnsense/docs/commit/1b5e6684c8

Both are available now as of recent 25.1.x.

Cheers,
Franco

WAN gateway not going back up after Internet outage

doktornotor

August 27, 2024, 06:47:14 PM #15

franco

August 27, 2024, 07:03:30 PM #16

doktornotor

August 27, 2024, 07:17:28 PM #17

franco

August 27, 2024, 09:06:41 PM #18

doktornotor

August 27, 2024, 09:10:26 PM #19

franco

August 28, 2024, 11:54:00 AM #20

kevindd992002

October 29, 2024, 02:54:38 AM #21

franco

October 29, 2024, 08:35:38 AM #22

kevindd992002

October 29, 2024, 02:01:18 PM #23

franco

November 08, 2024, 08:04:31 AM #24

FredFresh

April 18, 2025, 07:30:50 AM #25

drewhemm

May 19, 2025, 09:45:52 PM #26

drewhemm

May 19, 2025, 09:53:48 PM #27

franco

May 20, 2025, 08:03:26 AM #28