Monit no longer starts

Started by Taomyn, July 26, 2020, 11:37:08 AM

Previous topic - Next topic
July 26, 2020, 11:37:08 AM Last Edit: July 26, 2020, 11:39:13 AM by Taomyn
Since upgrading to v20.1.9 the Monit service will no longer start:

2020-07-26T11:15:38   monit: /usr/local/etc/monitrc:14: syntax error 'mail-format'
2020-07-26T11:15:30   root: /usr/local/etc/rc.d/monit: WARNING: failed to start monit


Any ideas?

The only place I see "mail-format" looks like this:



I also have this constantly at the top of the each settings tab, and apply does nothing:


Is this a monit change? I would prefer trusting binary upgrades *sighs*

# opnsense-revert -r 20.1.8 monit

Does this bring it back to life?


Cheers,
Franco

Nope, still the same error. When I went check the settings the "Apply" banner had gone but the service would not start. So I forced a save, the banner came back and won't go away again.


Do I need to reboot?

No, this is strange. What's in /usr/local/etc/monitrc line 14? Maybe a third party library parsing code changed and now it thinks there's invalid input there.

July 27, 2020, 12:25:54 PM #4 Last Edit: July 28, 2020, 10:43:09 AM by Taomyn

It's



set alert ferd@mydomain.com not on  mail-format { From: bart@mydomain.com } reminder on 10 cycles


And this is the whole file



# DO NOT EDIT THIS FILE -- OPNsense auto-generated file


set httpd unixsocket /var/run/monit.sock
    allow localhost


set daemon 120 with start delay 120


set logfile syslog facility log_daemon






set mailserver 192.168.1.10 port 25   


set alert ferd@mydomain.com not on  mail-format { From: bart@mydomain.com } reminder on 10 cycles


check system bart.mydomain.com
   if changed status then alert
   if cpu usage is greater than 75% then alert
   if loadavg (1min) is greater than 8 then alert
   if loadavg (5min) is greater than 6 then alert
   if loadavg (15min) is greater than 4 then alert
   if memory usage is greater than 75% then alert
   if failed link then alert
   if space usage is greater than 75% then alert


check filesystem RootFs with path "/"
   if space usage is greater than 75% then alert


check program carp_status_change with path "/usr/local/opnsense/scripts/OPNsense/Monit/carp_status" timeout 300 seconds
   if changed status then alert


check program gateway_alert with path "/usr/local/opnsense/scripts/OPNsense/Monit/gateway_alert" timeout 300 seconds
   if status != 0 then alert

FYI, if I clear "Mail format", the syntax error moves to "Reminder"

Looking at the Monit Manual, the syntax on line 14 is wrong.

https://mmonit.com/monit/documentation/monit.html#ALERT-MESSAGES

The 'not on' should be followed by an 'event', and it is not.

On the web GUI, I think you need to uncheck 'not on', or choose an event (or events) from the drop down box.  I am not sure that the help text of 'leave empty for all events' is correct. 

Currently the line in the config is:


set alert ferd@mydomain.com not on { uptime } mail-format { From: bart@mydomain.com } reminder on 10 cycles



the rest is still the same and the latest error is:


monit: /usr/local/etc/monitrc:17: syntax error 'changed'

O.K., so now Line 14 is working.

But there is now an error on Line 17,  Monit doesn't like the 'change' syntax.  Should there be another term here - what status changed? 

I've learnt that Monit is very picky about syntax and constructs and I have found this a struggle.  I think you need to study the Monit Manual.


Yes, but why now because all I did was upgrade then it broke, and now downgraded and still broken. I'm using the GUI to enter any changes, but can it be that broken to not be generating correct settings?

Quote from: pouakai on July 28, 2020, 10:53:22 AM
O.K., so now Line 14 is working.

But there is now an error on Line 17,  Monit doesn't like the 'change' syntax.  Should there be another term here - what status changed? 

I've learnt that Monit is very picky about syntax and constructs and I have found this a struggle.  I think you need to study the Monit Manual.

Still happening after upgrading to 20.7


Can we get this thread moved over to the production forum?

Thanks for moving the thread.


So does anyone know why I still cannot start Monit?


I'll happily revert it back to original settings if someone can tell me the steps an what files to delete.

You need to post your config, otherwise not possible to help

Quote from: mimugmail on August 06, 2020, 01:52:49 PM
You need to post your config, otherwise not possible to help


Did you look at the version posted earlier in this thread?


In the meantime I was able to narrow it down to one of the service settings, if I attempt to enable any of the "Tests" it fails - leave it at "Nothing selected" and it's fine.


This fails:


2020-08-06T15:14:20 monit[76762]: /usr/local/etc/monitrc:17: syntax error 'changed'





# DO NOT EDIT THIS FILE -- OPNsense auto-generated file


set httpd unixsocket /var/run/monit.sock
    allow localhost


set daemon 120 with start delay 120


set logfile syslog facility log_daemon






set mailserver 192.168.1.10 port 25   


set alert ferd@mydomain.com    reminder on 10 cycles


check system bart.mydomain.com
   if changed status then alert


check filesystem RootFs with path "/"
   if space usage is greater than 75% then alert


check program carp_status_change with path "/usr/local/opnsense/scripts/OPNsense/Monit/carp_status" timeout 300 seconds
   if changed status then alert


check program gateway_alert with path "/usr/local/opnsense/scripts/OPNsense/Monit/gateway_alert" timeout 300 seconds
   if status != 0 then alert





This does not





# DO NOT EDIT THIS FILE -- OPNsense auto-generated file


set httpd unixsocket /var/run/monit.sock
    allow localhost


set daemon 120 with start delay 120


set logfile syslog facility log_daemon






set mailserver 192.168.1.10 port 25   


set alert ferd@mydomain.com    reminder on 10 cycles


check system bart.mydomain.com


check filesystem RootFs with path "/"
   if space usage is greater than 75% then alert


check program carp_status_change with path "/usr/local/opnsense/scripts/OPNsense/Monit/carp_status" timeout 300 seconds
   if changed status then alert


check program gateway_alert with path "/usr/local/opnsense/scripts/OPNsense/Monit/gateway_alert" timeout 300 seconds
   if status != 0 then alert



The GUI does not check what you select.  It will let you select options or tests that do not work.  You will only find out when it won't start up.  Again, you need to read the Monit Manual (and look at the examples), for guidance as to what will work.

I think the 'changed' switch is for checking the output status of shell scripts, not the status of the system.

This is what I have under 'check system':

check system $HOST
   if memory usage is greater than 75% then alert
   if cpu usage is greater than 75% then alert
   if loadavg (1min) is greater than 8 then alert
   if loadavg (5min) is greater than 6 then alert


I haven't changed it; this was what Opnsense installed.

Make sure you have the service tests CPUUsage, LoadAvg1, LoadAvg5 and LoadAvg15 set up under Service Test Settings - they are installed by default, so should be there.