OPNsense Forum

Archive => 19.7 Legacy Series => Topic started by: anomaly0617 on December 17, 2019, 04:50:40 pm

Title: OpnSense stops routing all traffic when WAN drops
Post by: anomaly0617 on December 17, 2019, 04:50:40 pm
Hi all,

I've got a weird one, but I've now seen it at different locations and it's concerning.

Please note that this one is very similar to this post (https://forum.opnsense.org/index.php?topic=13236.msg61117#msg61117), also by me, from a while back.

First, some specs because well, everyone loves specs:

System Information
Code: [Select]
Name frasvrfw.{redacted}
Versions OPNsense 19.7.6-amd64
FreeBSD 11.2-RELEASE-p14-HBSD
OpenSSL 1.0.2t 10 Sep 2019
Updates Click to check for updates.
CPU Type Intel(R) Xeon(R) CPU X3450 @ 2.67GHz (8 cores)
CPU usage
Load average 0.53, 0.38, 0.28
Uptime 1 days 03:06:07
Current date/time Tue Dec 17 10:33:25 EST 2019
Last config change Mon Dec 16 3:31:41 EST 2019
State table size 1 % ( 8526/814000 )
MBUF Usage 1 % ( 7100/506546 )
Memory usage 13 % ( 1099/8144 MB )
SWAP usage 0 % ( 0/8192 MB )
Disk usage 2% / [ufs] (1.4G/101G)

Interfaces
Code: [Select]
   GUEST 1000baseT <full-duplex> 192.168.25.254
   LAN 1000baseT <full-duplex> 192.168.3.1
   PLC 1000baseT <full-duplex> 192.168.20.1
   Phones 1000baseT <full-duplex> 192.168.9.1
   Printers 1000baseT <full-duplex> 192.168.6.1
   SAN 1000baseT <full-duplex> 192.168.10.1
   SANBACKUPS 1000baseT <full-duplex> 192.168.16.254
   SECUREWIFI 1000baseT <full-duplex> 192.168.5.1
   SECURITY 1000baseT <full-duplex> 192.168.50.1
   WAN 1000baseT <full-duplex> {redacted}

Ok, with that out of the way, I've got a Dell PowerEdge R210 running as an OpnSense firewall/gateway between multiple networks (separated into VLANs and their own subnets) with the OpnSense firewall at the center.

We've now had this happen twice -- 16-Dec-2019 between 3:20 AM and 7:20 AM, and about a month ago where there was a 35 minute power outage.... so I'm no longer thinking this is a "fluke."

The trigger:
The WAN connection drops at the provider end (in other words, between the internet provider's fiber router and their closest node)

The symptom:
OpnSense stops routing all traffic -- even the internal traffic between internal subnets such as from the LAN to the PRINTER networks. This continues to occur even after the WAN connection is restored and the internet is available again.

The workaround resolution:
Reboot the firewall and all the problems go away... until the next time the internet connection drops.

In both instances, the power in the area has gone out, and our on-site battery backups and generator have kept the building up and running throughout. So my OpnSense logs do not show that the firewall has rebooted. But if I go to Reporting -> Health and look at any metric, ie:

Packets -> LAN
Packets -> WAN
Packets -> IPSec
Quality -> Gateway

They are all flat-lined between those times.

So the question is, what would cause OpnSense to stop routing traffic between internal networks when the WAN connection drops, and is there a way to fix it, short of setting up a check_internet script that reboots the firewall if it can't get to something really common like google?

Thanks in advance, all!
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: chemlud on December 17, 2019, 05:23:24 pm
...haven't seen anything related here when WAN goes down. Sure that all your switches and clients are covered by the USV?

Can you reproduce the issue by simply pulling the plug on your fibre modem?
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: anomaly0617 on December 17, 2019, 05:36:17 pm
We haven't tried that yet. Running a multi-site business, management gets a little... testy... when we arbitrarily decide to take the network down for no perceivable reason. This may be something we test on a weekend in the early morning hours.
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: agrumpyhermit on January 07, 2020, 07:33:07 pm
Did you ever figure this out? We get the same thing with our not so reliable internet. When the WAN goes down we lose internal traffic shortly after. Can't print, access internal Nextcloud, etc. Supposedly going to get fiber out here soon, which should be far more reliable than our WISP. But until then, we lose internet at least a few times a week.
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: anomaly0617 on January 08, 2020, 01:13:06 am
Not so far. I'm using my check_internet.sh script to get around the problem at the moment, but one thing I've noticed is that the cron job does not persist from upgrade to upgrade, so I have to manually put it back in.

It's unfortunate that you are seeing the same problem, but also reassuring that I'm not the only one. :-/
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: agrumpyhermit on January 08, 2020, 01:44:27 am
We don't have any power outages or reboots happening. Our wisp has issues some 25 or so miles away on a mountain top and the internet goes down for anywhere between a few minutes and few hours. Once the internet is down, internal routing soon follows. I'm using Unbound and wondered if switching to DNSMasq would resolve it, but haven't felt like trying it. I wish I were more knowledgeable to find info from the logs that might help.
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: c-mu on January 08, 2020, 11:18:25 am
I had a similar Problem like you, but a little different.

For exampler: On remote Site, when I had to reboot a attached switch, then my VPN Tunnel went down until die Uplinkport to the Internal Switch is up again, or opnsense was rebootet. The Internet connection for opnsense itself was not affected.

I found out, that when I set the checkbox "System - Settings - General - "Allow default gateway switching" then the problem does'n come again. I did not understand why, but since that it's working.

Maybe that could be a solution for you to.
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: agrumpyhermit on January 21, 2020, 02:15:52 am
anomaly0617, did you figure out your internal routing issue? I got my internal routing working stable regardless of the WAN by leaving all interfaces selected on Unbound's outgoing interface setting. Re-tested it on a fresh install and limiting the outbound interfaces killed internal routing without WAN. I selected all again and unplugged the WAN and never lost internal routing.
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: anomaly0617 on February 16, 2020, 07:54:15 pm
Just checked and no, that doesn't resolve it for me. From what I can tell, at least on my installs, the fix seems to be in rebooting the firewall if it cannot ping a server on each of the networks and a publicly hosted website like google.com. The problem is, every time you upgrade the firewall, the crontab rule that kicks this script off gets removed, so I have to re-add it manually.

Thanks for the idea, though! It made me go through my Unbound configuration with a fine tooth comb!
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: chemlud on February 16, 2020, 08:11:29 pm
Why should a cron job get deleted by updating OPNsense? I have custom cron jobs defined as templates via console and set up via GUI, they survive any update. Only if you do a fresh install you have to do the console part again, as it is not stored in config.xml.
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: marjohn56 on February 16, 2020, 09:59:44 pm
We don't have any power outages or reboots happening. Our wisp has issues some 25 or so miles away on a mountain top and the internet goes down for anywhere between a few minutes and few hours. Once the internet is down, internal routing soon follows. I'm using Unbound and wondered if switching to DNSMasq would resolve it, but haven't felt like trying it. I wish I were more knowledgeable to find info from the logs that might help.


Are you adding that Cron job the official way using actions.d?.
Title: Re: OpnSense stops routing all traffic when WAN drops
Post by: anomaly0617 on February 25, 2020, 01:31:27 am
Are you adding that Cron job the official way using actions.d?.

My guess is, probably not. I'm adding it via the console using crontab. There appears to be no GUI-based way to add a cron job that isn't from the pre-populated list. With that said, if there's a better way (actions.d) let me know how to do it and I'll change up my instructions to match once I try it. It would be really nice if I could get these to persist across upgrades!

Thanks!