Monit false alert due to incorrect evaluation of traffic

Started by Styx13, January 08, 2021, 01:37:01 AM

Previous topic - Next topic
Hello,

I put in place a few monit alerts in order to try and detect excessive upload from some of my networks. (trying to figure out if there could be data leakage or suspicious transfer of data to internet).

As we know, upload from a given network (in my case VLAN20) is actually download on the corresponding network interface on the firewall.
So I created a new "Service Test Setting" that I called 'Download2GBin1H" in the monit GUI to detect trigger if there is more than 2GB of data downloaded in the last hour as follow:


I then defined a Service Setting called "suspicious_upload_vlan20" as follow:


I also checked the actual monit configuration put in place in /usr/local/etc/monitrc, and here is the corresponding entry:
check network suspicious_upload_vlan20  interface vtnet2
   if total download > 2 GB in the last 1 hour then alert


I expect from that to receive an email alert when the hosts on VLAN 20 upload 2GB or more of data within 1 hour of time.

And it does seem to work: I ran some test, and uploaded some data on purpose to test and I did receive an alert email as I uploaded more than 2GB within an hour.

However, I do also receive alert email sometimes and when I check, I do not see nearly enough upload that occured within the last hour that would amount to more than 2GB.

As I started suspecting something is wrong, I ran a quick script to just keep an eye on the network interface that serves VLAN20 on my OPNSense firewall/gateway (vtnet2 is the network interface serving VLAN20, the 8th field in the netstat -I vtnet2 -b output is the Ibytes (bytes in = number of byte received from the network by the interface)):
# while true                                                                                                           
do
date;netstat -I vtnet2 -b | awk '/Link/{print "Uploaded by VLAN20: "$8/1024/1024 " MB"}'
sleep 600                                                                                                               
done
Thu Jan  7 11:04:39 EST 2021
Uploaded by VLAN20: 24254.7 MB
Thu Jan  7 11:14:39 EST 2021
Uploaded by VLAN20: 24255.1 MB
Thu Jan  7 11:24:39 EST 2021
Uploaded by VLAN20: 24255.5 MB
Thu Jan  7 11:34:39 EST 2021
Uploaded by VLAN20: 24256 MB
Thu Jan  7 11:44:39 EST 2021
Uploaded by VLAN20: 24256.5 MB
Thu Jan  7 11:54:39 EST 2021
Uploaded by VLAN20: 24257 MB
Thu Jan  7 12:04:39 EST 2021
Uploaded by VLAN20: 24257.4 MB
Thu Jan  7 12:14:39 EST 2021
Uploaded by VLAN20: 24259.3 MB
Thu Jan  7 12:24:39 EST 2021
Uploaded by VLAN20: 24277.6 MB
Thu Jan  7 12:34:39 EST 2021
Uploaded by VLAN20: 24296.4 MB
Thu Jan  7 12:44:39 EST 2021
Uploaded by VLAN20: 24313.5 MB


As you can see, from 11:04 am till 12:44pm, less than 60MB was uploaded.

But I still received this email at 12:11pm:
QuoteDownload bytes exceeded Service suspicious_upload_vlan20

    Date:        Thu, 07 Jan 2021 12:11:58
    Action:      alert
    Host:        OPNsense-primary.localdomain
    Description: total download 4.6 GB matches limit [download rate > 2 GB in last 1 hour]

Your faithful employee,
Monit

This scenario has occurred several times since I put in place those rules, but this is the first time I look at it more closely and grab some actual data from the interface itself to verify that in deed: what monit is reporting is not true.

This looks like it could be a bug in monit maybe ? Anybody encountered the same issue ? Does anybody knows where monit grab its statistics ?

Thank you

As I configured a new Monit servicet using a custom script that detects if I have connections established on my VPN (script returns 0 if no connection, returns 1 if connections), I discovered that monit does not seem to be properly monitoring.

My monit setup has 120 second polling interval (the default I believe)

As I was testing this new custom monit alert, I noticed that right after polling, monit did not update the status of my new service when it should have.
I had established a VPN connection, and so monit should have picked it up when it polled, but it did not.

I could see that monit updates the "data collected" time that is displayed in the overall monit status which indicates it did the polling (normally), but the status for that service did not get updated when it should have.

To make sure, I ran my script manually and could see that my script was working as expected (returning 1).

I repeated the test a few times and observed that it seems a bit random. Sometimes monit will pick it up on the first polling, sometimes it will take 2 or 3 polling before it realized there is actually an alert.


So I believe it is related to the issue I have with trying to detect excessive upload. I think monit does not always poll properly, or at least does not alert properly and may be delayed on the alerting.