WAN gateway not going back up after Internet outage

Started by kevindd992002, August 20, 2024, 10:25:45 AM

Previous topic - Next topic
Quote
But then I had a thought and tested it by unplugging the coax cable on the primary WAN instead of the NIC,  The failover to backup worked fine but when the coax was reconnected there was presumably no reconnect event and the primary was not detected up again.

I am not exactly hopeful here. apinger, dpinger, nonpinger... all suck. This stuck condition is something I've observed for ages on one particular site using "the other project's sense". That site is so inconveniently located that it was "solved" by using a IP watchdog socket. It will eventually power-cycle the ISP gear - which will trigger link down / up - enough to respawn the dpinger zombie back to life.

What if we offer a lightweight way to trigger the "fix stuck down monitors" code introduced here via cron job? Just the monitor part:

https://github.com/opnsense/core/commit/0c9d8c94049

Ok, so that's sort of go back to "unconditionally restarted all the time"?  :D

Fair point, although it's not randomly restarting due to monitor events. It's scheduled restarting via user wishes. And we don't have to guess what schedule the user prefers.

Over the years users seem to have grown fond of cron-based workarounds.


Cheers,
Franco

Well I don't have anything against that (will just cause some additional log noise). Better than having it in perpetually stuck state.

I made an error with the previous patch. Here is the revised version with cron job and all:

https://github.com/opnsense/core/issues/7027#issuecomment-2314857927

Only use the cron job if the issue persists with this patch applied over the next days.


Cheers,
Franco

I had this happen again today. Internet connection went down overnight because of maintenance. I woke up to no Internet. I had to go gateways, edit the gateway and save.

Is there any update to this? Is this happening to pfsense too?

The patch in question was added to 24.7.x already. You can add the "Manual gateway switch" cron job to adjust the situation.

I don't know about the other *sense. You trade a bug for another either way I think, but whatever works works.


Cheers,
Franco

Quote from: franco on October 29, 2024, 08:35:38 AM
The patch in question was added to 24.7.x already. You can add the "Manual gateway switch" cron job to adjust the situation.

I don't know about the other *sense. You trade a bug for another either way I think, but whatever works works.


Cheers,
Franco

I'm running 24.7.4_1 when this happened yesterday. Is the "manual gateway switch" created mainly as a workaround for this issue?


Hi,

I am correctly understanding thet the road map of this https://github.com/opnsense/core/issues/7027#issuecomment-2462108325 should lead to a solution with the Opnsense rev 25.7 ? Thank you.