1
General Discussion / watchdog script in bash
« on: June 27, 2023, 08:55:18 pm »
Hiho,
I often need(ed) a script to check for internet connectivity. Some devices offer some kind of, but most of them just try to ping google or similar, not the best idea IMHO. So I made my own, changed it quite often to my needs and use it to monitor connections or to react to failures like restarting a DSL or cable modem or the commonly bad ISP provided plastic box call "WIFI Superduper Highend Router".
Had this problem today, searched for something better, did not found much, so I just drop it here, some might find it useful, hf.
Will have errors, problems, bugs, but well, it's an oooold script and grown over the years.
I often need(ed) a script to check for internet connectivity. Some devices offer some kind of, but most of them just try to ping google or similar, not the best idea IMHO. So I made my own, changed it quite often to my needs and use it to monitor connections or to react to failures like restarting a DSL or cable modem or the commonly bad ISP provided plastic box call "WIFI Superduper Highend Router".
Had this problem today, searched for something better, did not found much, so I just drop it here, some might find it useful, hf.
Will have errors, problems, bugs, but well, it's an oooold script and grown over the years.
Code: [Select]
#!/bin/bash
# v0.6.2 Zeitkind
# Watchdog-script for testing Internet connectivity.
# Runs here on an internal Linux machine, but could be anything that knows bash.
# Script might need to be root, so use cron or sudo to start?
# If connection is down, we can trigger other things, like
# starting a failover line or try reseting the dumb modem / router provided by your ISP
# Have you tried turning it off and on again?
# Cable modems started in routing mode, but should have started in bridging mode
# but line was down and profile from ISP was not loaded - also those funny things.
# ...
# Check for log file and create it
if [ ! -f /var/log/connection-check.log ]; then
touch /var/log/connection-check.log
fi
# Only 1 instance of this script should run
# Check is lock file exists, if not create it and set trap on exit
if { set -C; 2>/dev/null >/tmp/connection-check.lock; }; then
trap "rm -f /tmp/connection-check.lock" EXIT
else
echo "Lock file exists… exiting"
exit
fi
# Counters counts the failed connection-checks with either google.com or the second site, eg. sfr.fr
counter1=0
counter2=0
# Function to reset counters
# If only one connection is successful, we are nevertheless online and can reset both.
# Not needed if we don't run the script as a daemon, we just exit the script then.
reset_counters() {
counter1=0
counter2=0
}
# Function to log a message and run a script
#
# If we need to restart our router, we log this and call a script that can power off
# and power on the router (or the cable modem or the cheap switch or everything).
# Use your own script, whatever is needed.
# I have a Tasmota power plug and a script called restart-router.sh
# The cheap plastic router from ISP tends to lock up, so I need to power cycle this pos.
# Tasmota should be able to stack commands with Backlog; didn't work for me though, so we just
# power off, wait some seconds and power on again.
# curl -X POST http://<IP>/cm?cmnd=Power%20off
# sleep 10s
# curl -X POST http://<IP>/cm?cmnd=Power%20on
# Log the restart event
# echo "$(date +"%Y-%m-%d %H:%M:%S") Tasmota: Restarted router." >> /var/log/connection-check.log
#
log_and_restart() {
echo "$(date +"%Y-%m-%d %H:%M:%S") $1" >> /var/log/connection-check.log
/usr/bin/restart-router.sh
# Router or Modem needs some time to boot and reconnect, so we wait.
# DSL connections might take quite long to reestablish. Adjust for your needs.
# Remark: Some DSL line ports reset on ISP side if our modem is not powered on for
# several minutes, might help to test if your line port is ancient or bad or often
# crash on ISP side and they don't fix it.
sleep 5m
}
# Main testing loops. We use wget to check for Google and our provider
# We use 2 tries and wait for the answer, line might be busy.
# This whole script can be called by other scripts or cron, so logging that we are online is optional
# Space in /var/log might be limited anyway, be sure to clean the logs or use logrotate etc.
# Logging failed tests is also optional, remove the #'s to enable the parts you want.
while true; do
wget -q --tries=2 --timeout=10 --spider http://google.com
# If we reach google, we are online and can just exit the script.
# Change this if you want to run this script as a permanent daemon etc. - not recommended though.
if [[ $? -eq 0 ]]; then
# echo "$(date +"%Y-%m-%d %H:%M:%S") Google answers, we are online." >> /var/log/connection-check.log
exit
else
counter1=$((counter1+1));
# echo "$(date +"%Y-%m-%d %H:%M:%S") Google unreachable $counter1" >> /var/log/connection-check.log;
# For debugging write to console
# echo $counter1;
fi
wget -q --tries=2 --timeout=10 --spider http://sfr.fr
if [[ $? -eq 0 ]]; then
# If we reach our provider, we are online and we just exit the script.
# But the provider itself might be offline, so we could mod here and fire up a
# different connection or switch to a backup line.
# Some providers offer a special "ping-test-server" to check the connection, might also be
# a possibility. I recommend against using eg. ping 8.8.8.8, Google sometimes just ignores
# pings. Same with 1.1.1.1 or 4.4.4.4 etc. pp.
# echo "$(date +"%Y-%m-%d %H:%M:%S") SFR answers, we are online." >> /var/log/connection-check.log
exit
else
counter2=$((counter2+1));
# echo "$(date +"%Y-%m-%d %H:%M:%S") SFR unreachable $counter2" >> /var/log/connection-check.log;
# For debugging write to console
# echo $counter2;
fi
# If either of the counters are still zero, i.e. one test was successful, we are online
# and can exit the script, it could be started by cron again etc.
# If the scripts gets modified to run all the time, we need to reset the counters to zero.
# If we just exit, that doesn't matter.
if [[ $counter1 -eq 0 || $counter2 -eq 0 ]]; then
# reset_counters
exit
fi
# If both connection tests fail 5 times, we seem to be offline.
if [[ $counter1 -eq 5 && $counter2 -eq 5 ]]; then
log_and_restart "We seem to be offline, restarting router!"
if [[ $? -eq 0 ]]; then
break
fi
fi
# We should wait some time before we test again.
# If Internet has just a short hiccup, 30s will be fine.
sleep 30s
done
# After rebooting the router/modem, Internet should come back, so we test
# We can log this if we want
while true; do
wget -q --tries=2 --timeout=10 --spider http://google.com
if [[ $? -eq 0 ]]; then
# echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
wget -q --tries=2 --timeout=10 --spider http://sfr.fr
if [[ $? -eq 0 ]]; then
# echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
# We are still offline, but restarting the router or modem twice in a short time isn't really the best idea.
# So we wait half an hour, probably the ISP has problems.
# We can log this
# echo "$(date +"%Y-%m-%d %H:%M:%S") Router restarted, but still offline. Sleeping for 30 minutes" >> /var/log/connection-check.log;
sleep 30m
# Waited enough, leave while to re-check.
break
done
while true; do
wget -q --tries=2 --timeout=10 --spider http://google.com
if [[ $? -eq 0 ]]; then
# echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
wget -q --tries=2 --timeout=10 --spider http://sfr.fr
if [[ $? -eq 0 ]]; then
# echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
# Both tests failed, so we are still offline after about an hour, so we restart the router again.
# We wait 6 hours for the next test, probably the ISP has problems.
# This script was made for a remote place far away, so adjusting the values might be a good idea.
log_and_restart "Router restarted again. Let's see.."
# Now try again
wget -q --tries=2 --timeout=10 --spider http://google.com
if [[ $? -eq 0 ]]; then
echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
wget -q --tries=2 --timeout=10 --spider http://sfr.fr
if [[ $? -eq 0 ]]; then
echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
# Still offline, so we have to wait again
echo "$(date +"%Y-%m-%d %H:%M:%S") Router restarted, but still offline. Sleeping now for 6 hours"
sleep 6h
# Waited enough, leave while to re-check.
break
done
while true; do
wget -q --tries=2 --timeout=10 --spider http://google.com
if [[ $? -eq 0 ]]; then
echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
wget -q --tries=2 --timeout=10 --spider http://sfr.fr
if [[ $? -eq 0 ]]; then
echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
# We are still offline, now we restart the router after 24h and pray..
log_and_restart "Router restarted again. Let's see.."
# Now try again
wget -q --tries=2 --timeout=10 --spider http://google.com
if [[ $? -eq 0 ]]; then
echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
wget -q --tries=2 --timeout=10 --spider http://sfr.fr
if [[ $? -eq 0 ]]; then
echo "$(date +"%Y-%m-%d %H:%M:%S") Online again!" >> /var/log/connection-check.log;
exit
fi
# Still offline, so we have to wait again
echo "$(date +"%Y-%m-%d %H:%M:%S") Router restarted, but still offline. Sleeping now for 24 hours"
sleep 24h
# Now we don't exit, we just restart our router every 24h.
# "Worked for me."® :D
done