OPNsense Forum

English Forums => Tutorials and FAQs => Topic started by: redbull666 on February 26, 2022, 09:30:52 am

Title: Monitoring your ZFS root using monit
Post by: redbull666 on February 26, 2022, 09:30:52 am
I modified a ZFS monitoring script a bit, and use it on Opnsense. It will monitor your "zroot" ZFS pool if you have installed Opnsense on ZFS (you should, ZFS is amazing).

First copy this script to your Opnsense install, I have it in /root. Make sure it's executable.

Code: [Select]
#! /bin/sh
#
## ZFS health check script for monit.
## Original script from:
## Calomel.org
##     https://calomel.org/zfs_health_check_script.html
#

# Parameters

maxCapacity=$1 # in percentages

usage="Usage: $0 maxCapacityInPercentages\n"

if [ ! "${maxCapacity}" ]; then
  printf "Missing arguments\n"
  printf "${usage}"
  exit 1
fi

# Output for monit user interface

printf "==== ZPOOL STATUS ====\n"
printf "$(/sbin/zpool status)"
printf "\n\n==== ZPOOL LIST ====\n"
printf "%s\n" "$(/sbin/zpool list)"


# Health - Check if all zfs volumes are in good condition. We are looking for
# any keyword signifying a degraded or broken array.

condition=$(/sbin/zpool status | grep -E 'DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover')

if [ "${condition}" ]; then
  printf "\n==== ERROR ====\n"
  printf "One of the pools is in one of these statuses: DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover!\n"
  printf "$condition"
  exit 1
fi


# Capacity - Make sure the pool capacity is below 80% for best performance. The
# percentage really depends on how large your volume is. If you have a 128GB
# SSD then 80% is reasonable. If you have a 60TB raid-z2 array then you can
# probably set the warning closer to 95%.
#
# ZFS uses a copy-on-write scheme. The file system writes new data to
# sequential free blocks first and when the uberblock has been updated the new
# inode pointers become valid. This method is true only when the pool has
# enough free sequential blocks. If the pool is at capacity and space limited,
# ZFS will be have to randomly write blocks. This means ZFS can not create an
# optimal set of sequential writes and write performance is severely impacted.

capacity=$(/sbin/zpool list -H -o capacity | cut -d'%' -f1)

for line in ${capacity}
  do
    if [ $line -ge $maxCapacity ]; then
      printf "\n==== ERROR ====\n"
      printf "One of the pools has reached it's max capacity!"
      exit 1
    fi
  done


# Errors - Check the columns for READ, WRITE and CKSUM (checksum) drive errors
# on all volumes and all drives using "zpool status". If any non-zero errors
# are reported an email will be sent out. You should then look to replace the
# faulty drive and run "zpool scrub" on the affected volume after resilvering.

errors=$(/sbin/zpool status | grep ONLINE | grep -v state | awk '{print $3 $4 $5}' | grep -v 000)

if [ "${errors}" ]; then
  printf "\n==== ERROR ====\n"
  printf "One of the pools contains errors!"
  printf "$errors"
  exit 1
fi

# Finish - If we made it here then everything is fine
exit 0

Then add a new service to your monit configuration in Opnsense. The "80" is a parameter for one of the alerts, specifically triggering when the pool is 80% full. Of course the script will also trigger on serious issues, such as a degraded pool if one the disks in your mirror is offline.

(https://i.ibb.co/5FTTPb2/image.png)

That's it, assuming you have configured monit correctly to send emails, for example I am using:

(https://i.ibb.co/gMLJTrW/image.png)
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on December 21, 2022, 04:54:12 pm
I get 'Status Failed' on the status page.

Here is the log
Error   monit   'zfs_monit' failed to execute '/usr/local/bin/ZFS_monit.sh 80' -- No such file or directory

I assure you the file is there and I set the permissions to 755. I am running a CPU temp check the same way in Monit and it works fine.

I am disabling this service until a resolution is available.
Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 13, 2023, 08:56:02 pm
Thanks @redbull666 for the script!

Can confirm that the script works like a charm.

Error   monit   'zfs_monit' failed to execute '/usr/local/bin/ZFS_monit.sh 80' -- No such file or directory

Are you sure the file name is named ZFS_monit.sh (instead of zfs_monit.sh)? Maybe you just move it to /root and try if it works there?
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 13, 2023, 10:39:33 pm
Moved it to/root/zfs_monit.sh and file name is now zfs_monit.sh
Service is setup exactly like the example.
Getting 'zfs_monit' failed to execute '/root/zfs_monit.sh 80' -- No such file or directory
Here are my settings - I disabled the service check for now because it gives the above error
What am I missing? permissions set to 755. Are there any Service Tests Settings needed?
I am running another custom test in Monit and it works fine.
After some testing I have determined there is an error in the script. The script is looking for something that does not exist. # sudo /sbin/zpool seems to work fine
Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 13, 2023, 11:10:48 pm
Can you login via SSH to your OPNsense, issue the following command and post the result?

ls -lart /root/zfs_monit.sh
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 13, 2023, 11:21:25 pm
here is the result

root@firewall:~ # ls -lart /root/zfs_monit.sh
-rwxr-xr-x  1 root  wheel  2590 Dec 21 08:46 /root/zfs_monit.sh
root@firewall:~ #

The problem seem to be something in the script itself. It finds the command just fine, but that error shows that something in the script cannot be found.
Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 13, 2023, 11:27:18 pm
If you are connected via SSH you could try to run the script from there:

/root/zfs_monit.sh 80

Does that work?

Please also check if the encoding of the script is correct. You can do that with this command:

cat /root/zfs_monit.sh

The script output should look exactly like in the first post of the thread starter.
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 13, 2023, 11:28:37 pm
Tried that, get same error. It's in the script. I tried copying the script to the file again in case I missed some code, Still doesn't work. There is an error in the script.

I ran cat /root/zfs_monit.sh and the console displays the script itself. Also tried removing all the comments in case of a syntax issue. didn't work. I am not a programmer so I can't tell where the coding issues are.

By the way, running OPNsense 22.7.10_2-amd64. Maybe I am missing a plugin with needed files? I have no plugins installed and running a default configuration. I am running ZFS. zpool status works.

Tried on second OPNsense installation with same results.

Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 13, 2023, 11:52:53 pm
No the script doesn't involve any plugin. The error message is clear and normally only shows up if either:

- the file and folder cannot be found (like the message says)
- a binary file has been compiled for a different architecture (my experience and not applicable here)

So this issue is very strange.

Let's try it step by step:

cd /root
echo '#!/bin/sh' > test.sh
echo 'echo SH EXECUTION TEST' >> test.sh
chmod +x test.sh
./test.sh

Does it show "SH EXECUTION TEST" at the end?
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 14, 2023, 12:10:50 am
Here is the result

root@firewall:~ # cd /root
root@firewall:~ # echo '#!/bin/sh' > test.sh
/bin/sh: Event not found.
root@firewall:~ # echo 'echo SH EXECUTION TEST' >> test.sh
root@firewall:~ # chmod +x test.sh
root@firewall:~ # /test.sh
/test.sh: Command not found.
root@firewall:~ #

If a file is not found you get the message 'Command not found'
The message from the zfs_monit.sh logs is 'No such file or directory'. That tells me there is something in the script it cannot find. It appears like a message from running the script.
Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 14, 2023, 12:16:49 am
It could not issue the most important command so the "file preparation" was not successful:

echo '#!/bin/sh' > test.sh
/bin/sh: Event not found.

Could you please edit the file with a tool like vi or nano (the first one is available on OPNsense) and make sure its content is:

#!/bin/sh
echo SH EXECUTION TEST

Try to execute it again afterwards. Also you didn't even execute it because you missed the . at the beginning. That's why it tells you "Command not found".
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 14, 2023, 12:19:07 am
Yes the file contains 'echo SH EXECUTION TEST'

root@firewall:~ # ./test.sh
SH EXECUTION TEST
root@firewall:~ #

Then I tried

root@firewall:~ # ./zfs_monit.sh
./zfs_monit.sh: Command not found.


Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 14, 2023, 12:23:48 am
Does it also contain

#!/bin/sh

in the first line of the file? That's important that the file contains both lines as I stated in my previous post.
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 14, 2023, 12:25:14 am
The test.sh file only contains 'echo SH EXECUTION TEST', no #!/bin/sh
If I add the '#!/bin/sh' I get same result
root@firewall:~ # ./test.sh
SH EXECUTION TEST
Same results with and without '#!/bin/sh'
Maybe can't recognize file type?

I also tried '#! /bin/sh' as in the original monit script above

zfs_monit.sh as it is now
#!/bin/sh
#
## ZFS health check script for monit.
## Original script from:
## Calomel.org
##     https://calomel.org/zfs_health_check_script.html
#
................... and the rest of the file
Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 14, 2023, 12:55:16 am
The space between the #! and the /bin/sh doesn't matter.

This line is important because it tells the interpreter how to handle the following lines. So it must stay at the top of every executable bash file.

Nevertheless I would recommend you to add the script from the thread starter block by block to my test file and execute it after every step. This might bring you to the line which causes the script to fail.

Please note that you need to add the if and the for blocks completely. So it's OK to just copy the line

maxCapacity=$1 # in percentages

for testing but the if blocks like

if [ ! "${maxCapacity}" ]; then
  printf "Missing arguments\n"
  printf "${usage}"
  exit 1
fi


and also the for block needs to be added as a whole.

And indeed you can skip all comments.
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 15, 2023, 11:03:13 pm
I have tried commenting the entire script and it still shows 'No such file or directory'
Must be something else.

Maybe there is a problem with the NonZeroStatus test. No other enabled Monit service uses that test.

[Update] I enabled gateway status check, which uses the NonZeroStatus test and it works fine. I am at a complete loss on this one.
Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 15, 2023, 11:22:59 pm
Not very likely if you can't even execute the script using the command line. This has nothing to do with Monit at all.
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 15, 2023, 11:27:01 pm
There must be some issue with my OPNsense. I can execute the test.sh just fine in the same location. I have two other custom Monit scripts and they work fine.
Title: Re: Monitoring your ZFS root using monit
Post by: SWEETGOOD on January 15, 2023, 11:49:13 pm
My assumption still is that the error is caused by either wrong (invisible) characters in the bash file or by a wrong encoding.

If you can execute other files (with the same shebang at the top) just fine there MUST be an issue with this particular file.
Title: Re: Monitoring your ZFS root using monit
Post by: dcol on January 15, 2023, 11:59:37 pm
Sure seems that way. I just copied the script from this thread and used notepad to save it to a file. Did the same with one of my other custom Monit tests.
Title: Re: Monitoring your ZFS root using monit
Post by: cookiemonster on January 16, 2023, 12:11:27 am
try to learn a bit about unixes/bsds dcol, it helps. Most of the errors you have posted are result of not being familiar with the way the OSs work.
For instance they are case sensitive. zfs-file.sh is not the same as ZFS-file.sh. The result is file/object not found.
Paths are important. If you try to find or run a file, the path has to be able to get to it. A period i.e, "." denotes the current location you are at.
The shebang in a script tells the OS which shell to use, so
Code: [Select]
#! /bin/sh at the beginning is important.

Do you have nano installed on OPN? Forget the windows editors, they introduce characters mostly for line endings that are different in unix and windows, and often a source of no end of problems. In this case I doubt is the problem but a) will help in the future, b) will take care of it if is the problem.
Code: [Select]
pkg install nano is to install it.

Once installed. Remove the file and recreate it as per the first post with nano, that allows you to paste into it.
Take it from there.