I modified a ZFS monitoring script a bit, and use it on Opnsense. It will monitor your "zroot" ZFS pool if you have installed Opnsense on ZFS (you should, ZFS is amazing).
First copy this script to your Opnsense install, I have it in /root. Make sure it's executable.
#! /bin/sh
#
## ZFS health check script for monit.
## Original script from:
## Calomel.org
## https://calomel.org/zfs_health_check_script.html
#
# Parameters
maxCapacity=$1 # in percentages
usage="Usage: $0 maxCapacityInPercentages\n"
if [ ! "${maxCapacity}" ]; then
printf "Missing arguments\n"
printf "${usage}"
exit 1
fi
# Output for monit user interface
printf "==== ZPOOL STATUS ====\n"
printf "$(/sbin/zpool status)"
printf "\n\n==== ZPOOL LIST ====\n"
printf "%s\n" "$(/sbin/zpool list)"
# Health - Check if all zfs volumes are in good condition. We are looking for
# any keyword signifying a degraded or broken array.
condition=$(/sbin/zpool status | grep -E 'DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover')
if [ "${condition}" ]; then
printf "\n==== ERROR ====\n"
printf "One of the pools is in one of these statuses: DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover!\n"
printf "$condition"
exit 1
fi
# Capacity - Make sure the pool capacity is below 80% for best performance. The
# percentage really depends on how large your volume is. If you have a 128GB
# SSD then 80% is reasonable. If you have a 60TB raid-z2 array then you can
# probably set the warning closer to 95%.
#
# ZFS uses a copy-on-write scheme. The file system writes new data to
# sequential free blocks first and when the uberblock has been updated the new
# inode pointers become valid. This method is true only when the pool has
# enough free sequential blocks. If the pool is at capacity and space limited,
# ZFS will be have to randomly write blocks. This means ZFS can not create an
# optimal set of sequential writes and write performance is severely impacted.
capacity=$(/sbin/zpool list -H -o capacity | cut -d'%' -f1)
for line in ${capacity}
do
if [ $line -ge $maxCapacity ]; then
printf "\n==== ERROR ====\n"
printf "One of the pools has reached it's max capacity!"
exit 1
fi
done
# Errors - Check the columns for READ, WRITE and CKSUM (checksum) drive errors
# on all volumes and all drives using "zpool status". If any non-zero errors
# are reported an email will be sent out. You should then look to replace the
# faulty drive and run "zpool scrub" on the affected volume after resilvering.
errors=$(/sbin/zpool status | grep ONLINE | grep -v state | awk '{print $3 $4 $5}' | grep -v 000)
if [ "${errors}" ]; then
printf "\n==== ERROR ====\n"
printf "One of the pools contains errors!"
printf "$errors"
exit 1
fi
# Finish - If we made it here then everything is fine
exit 0
Then add a new service to your monit configuration in Opnsense. The "80" is a parameter for one of the alerts, specifically triggering when the pool is 80% full. Of course the script will also trigger on serious issues, such as a degraded pool if one the disks in your mirror is offline.
(https://i.ibb.co/5FTTPb2/image.png)
That's it, assuming you have configured monit correctly to send emails, for example I am using:
(https://i.ibb.co/gMLJTrW/image.png)
I get 'Status Failed' on the status page.
Here is the log
Error monit 'zfs_monit' failed to execute '/usr/local/bin/ZFS_monit.sh 80' -- No such file or directory
I assure you the file is there and I set the permissions to 755. I am running a CPU temp check the same way in Monit and it works fine.
I am disabling this service until a resolution is available.
Thanks @redbull666 for the script!
Can confirm that the script works like a charm.
Quote from: dcol on December 21, 2022, 04:54:12 PM
Error monit 'zfs_monit' failed to execute '/usr/local/bin/ZFS_monit.sh 80' -- No such file or directory
Are you sure the file name is named ZFS_monit.sh (instead of zfs_monit.sh)? Maybe you just move it to /root and try if it works there?
Moved it to/root/zfs_monit.sh and file name is now zfs_monit.sh
Service is setup exactly like the example.
Getting 'zfs_monit' failed to execute '/root/zfs_monit.sh 80' -- No such file or directory
Here are my settings - I disabled the service check for now because it gives the above error
What am I missing? permissions set to 755. Are there any Service Tests Settings needed?
I am running another custom test in Monit and it works fine.
After some testing I have determined there is an error in the script. The script is looking for something that does not exist. # sudo /sbin/zpool seems to work fine
Can you login via SSH to your OPNsense, issue the following command and post the result?
ls -lart /root/zfs_monit.sh
here is the result
root@firewall:~ # ls -lart /root/zfs_monit.sh
-rwxr-xr-x 1 root wheel 2590 Dec 21 08:46 /root/zfs_monit.sh
root@firewall:~ #
The problem seem to be something in the script itself. It finds the command just fine, but that error shows that something in the script cannot be found.
If you are connected via SSH you could try to run the script from there:
/root/zfs_monit.sh 80
Does that work?
Please also check if the encoding of the script is correct. You can do that with this command:
cat /root/zfs_monit.sh
The script output should look exactly like in the first post of the thread starter.
Tried that, get same error. It's in the script. I tried copying the script to the file again in case I missed some code, Still doesn't work. There is an error in the script.
I ran cat /root/zfs_monit.sh and the console displays the script itself. Also tried removing all the comments in case of a syntax issue. didn't work. I am not a programmer so I can't tell where the coding issues are.
By the way, running OPNsense 22.7.10_2-amd64. Maybe I am missing a plugin with needed files? I have no plugins installed and running a default configuration. I am running ZFS. zpool status works.
Tried on second OPNsense installation with same results.
No the script doesn't involve any plugin. The error message is clear and normally only shows up if either:
- the file and folder cannot be found (like the message says)
- a binary file has been compiled for a different architecture (my experience and not applicable here)
So this issue is very strange.
Let's try it step by step:
cd /root
echo '#!/bin/sh' > test.sh
echo 'echo SH EXECUTION TEST' >> test.sh
chmod +x test.sh
./test.sh
Does it show "SH EXECUTION TEST" at the end?
Here is the result
root@firewall:~ # cd /root
root@firewall:~ # echo '#!/bin/sh' > test.sh
/bin/sh: Event not found.
root@firewall:~ # echo 'echo SH EXECUTION TEST' >> test.sh
root@firewall:~ # chmod +x test.sh
root@firewall:~ # /test.sh
/test.sh: Command not found.
root@firewall:~ #
If a file is not found you get the message 'Command not found'
The message from the zfs_monit.sh logs is 'No such file or directory'. That tells me there is something in the script it cannot find. It appears like a message from running the script.
It could not issue the most important command so the "file preparation" was not successful:
echo '#!/bin/sh' > test.sh
/bin/sh: Event not found.
Could you please edit the file with a tool like vi or nano (the first one is available on OPNsense) and make sure its content is:
#!/bin/sh
echo SH EXECUTION TEST
Try to execute it again afterwards. Also you didn't even execute it because you missed the . at the beginning. That's why it tells you "Command not found".
Yes the file contains 'echo SH EXECUTION TEST'
root@firewall:~ # ./test.sh
SH EXECUTION TEST
root@firewall:~ #
Then I tried
root@firewall:~ # ./zfs_monit.sh
./zfs_monit.sh: Command not found.
Does it also contain
#!/bin/sh
in the first line of the file? That's important that the file contains both lines as I stated in my previous post.
The test.sh file only contains 'echo SH EXECUTION TEST', no #!/bin/sh
If I add the '#!/bin/sh' I get same result
root@firewall:~ # ./test.sh
SH EXECUTION TEST
Same results with and without '#!/bin/sh'
Maybe can't recognize file type?
I also tried '#! /bin/sh' as in the original monit script above
zfs_monit.sh as it is now
#!/bin/sh
#
## ZFS health check script for monit.
## Original script from:
## Calomel.org
## https://calomel.org/zfs_health_check_script.html
#
................... and the rest of the file
The space between the #! and the /bin/sh doesn't matter.
This line is important because it tells the interpreter how to handle the following lines. So it must stay at the top of every executable bash file.
Nevertheless I would recommend you to add the script from the thread starter block by block to my test file and execute it after every step. This might bring you to the line which causes the script to fail.
Please note that you need to add the if and the for blocks completely. So it's OK to just copy the line
maxCapacity=$1 # in percentages
for testing but the if blocks like
if [ ! "${maxCapacity}" ]; then
printf "Missing arguments\n"
printf "${usage}"
exit 1
fi
and also the for block needs to be added as a whole.
And indeed you can skip all comments.
I have tried commenting the entire script and it still shows 'No such file or directory'
Must be something else.
Maybe there is a problem with the NonZeroStatus test. No other enabled Monit service uses that test.
[Update] I enabled gateway status check, which uses the NonZeroStatus test and it works fine. I am at a complete loss on this one.
Not very likely if you can't even execute the script using the command line. This has nothing to do with Monit at all.
There must be some issue with my OPNsense. I can execute the test.sh just fine in the same location. I have two other custom Monit scripts and they work fine.
My assumption still is that the error is caused by either wrong (invisible) characters in the bash file or by a wrong encoding.
If you can execute other files (with the same shebang at the top) just fine there MUST be an issue with this particular file.
Sure seems that way. I just copied the script from this thread and used notepad to save it to a file. Did the same with one of my other custom Monit tests.
try to learn a bit about unixes/bsds dcol, it helps. Most of the errors you have posted are result of not being familiar with the way the OSs work.
For instance they are case sensitive. zfs-file.sh is not the same as ZFS-file.sh. The result is file/object not found.
Paths are important. If you try to find or run a file, the path has to be able to get to it. A period i.e, "." denotes the current location you are at.
The shebang in a script tells the OS which shell to use, so #! /bin/sh
at the beginning is important.
Do you have nano installed on OPN? Forget the windows editors, they introduce characters mostly for line endings that are different in unix and windows, and often a source of no end of problems. In this case I doubt is the problem but a) will help in the future, b) will take care of it if is the problem.
pkg install nano
is to install it.
Once installed. Remove the file and recreate it as per the first post with nano, that allows you to paste into it.
Take it from there.