Firewall reboots and will not route packets - problem and possible solution

Started by anomaly0617, June 17, 2019, 05:46:24 PM

Previous topic - Next topic
Hi all,

Those of you that have been around the *Sense world for awhile probably know me. I've been a fan for a number of years, and have a lot of posts on the pf as well as the opnSense forums. Heck, I may even have a few on m0n0wall, since I go back that far on distros of this firewall.

If you're just now reading this, this is an awesome firewall solution and I've found nothing that compares to it in the commercial world. There's a balance of functionality and UI that I've never seen at this level in a commercial product. So what's below is more for the folks that have been running some version of *Sense for a little while and is not representative of the product of a whole.

Ok, disclaimer over. Now to the real topic:

I have somewhere in the neighborhood of 30-50 *Sense installations in production. These span everything from a terrible DSL connection in the middle of nowhere's-ville to a 200 x 200 Mbps redundant fiber connection in a controlled data center environment. I'm seeing this problem across the board, on all installations. That means everything pf 2.3.2 and newer, and I'm guessing everything opnSense 18.1.x and newer. In other words, I'm thinking this is a FreeBSD problem and not a *Sense problem.

Symptom:

The firewall is set up to reboot on a schedule. After the scheduled reboot, the firewall does not respond to anything remotely. ICMP traffic does not respond, packets are not forwarded, rules are not processed. It's as if the firewall is stuck in the default state of "deny all" for all packets.

Resolution A (a bad one):

Go to the console and log in if necessary. Reboot the firewall using the appropriate menu option. Let the firewall reboot. Check again and see if you can access the outside world, and the outside world can access you.

Resolution B (possibly better?)

Run a script that checks to see if the firewall can reach the gateway device (DSL modem, DOCSIS router, fiber router, etc.) and/or a common website (Google DNS, for instance). If it cannot, issue a reboot command. Possible side-effect: This could reboot the firewall in the middle of the day with no warning to your end users.

Anyone else run into this problem? The most common hardware we use are Dell PowerEdge R2{x}0's as firewalls, so its possible it is specific to those, but I believe I've also seen this behavior on the micro-firewalls we have running made by a reputable Amazon reseller ending in "LI". ;-)

DISCLAIMER: If you can't tell, this email has been formatted to avoid getting a nasty letter from that one company that bought one of the *Sense distributions and has been known for sending nasty letters to people trying to help.

EDIT: I found the script I've been testing/using to fix this issue. Here it is. Your mileage may vary...

(Mine is located as /root/check_internet.sh - this may not be the best place for the code to persist through updates.)

To implement this, you need to open an SSH session to your firewall, log in, and select "8. Shell" from the menu.

Once there, type

vi /root/check_internet.sh

The editor you are now in is called vi. Vi is very powerful, but not very user friendly. Here's what you need to do in order to get this going...

Type :i[Enter]. This puts VI in Insert mode.

Paste in the following code. I use PuTTY, and in PuTTY to paste you just right-click after you've copied the code.


#!/bin/sh

TMP_FILE=/tmp/inet_up

# Edit this function if you want to do something besides reboot
no_inet_action() {
    shutdown -r +1 'No internet.'
}

if ping -c5 google.com; then
    echo 1 > $TMP_FILE
else
    [[ `cat $TMP_FILE` == 0 ]] && no_inet_action || echo 0 > $TMP_FILE
fi


Now type [Esc] :wq [Enter]. This says "Exit Insert Mode, Command: Write, Quit" and Enter executes it.

After you create it, you need to run
chmod a+x /root/check_internet.sh
which makes it executable.

Normally I'd tell you to go to the Web UI and set up a new Cron Job, but OpnSense does not seem to have the ability to accept a custom command, so instead we're going to do this:

crontab -e

Again, you're in VI.

Type :i[enter], which puts the editor in "Insert" mode. Go to the bottom of the file and add a new line:

*/5     *       *       *       *       /root/check_internet.sh > /dev/null

Now type [Esc] :wq [Enter]. Again, this says "Exit Insert Mode, Command: Write, Quit" and Enter executes it.

What you told the cron (scheduler) to do is "run this command every 5 minutes". If the command succeeds, nothing will happen. If the command fails, it will automatically reboot the firewall.


FYI (because credit where credit is due) I found the base code and modified it from here.

The upside of this script is that if the interfaces on your firewall are not properly initialized or did not properly initialize after a reboot, this will reboot the firewall and give it another chance without you having the drive in to the office and intervene manually.

The downside of this script is, if Google appears to be down from the firewall, it's going to reboot your firewall every 5 minutes. So, be aware of the Pro's and Con's of this script.

Hope this helps. Please let me know if this resolved an issue for you!