Hi there,
after I upgraded to 18.7.4. I noticed that the telegraph plugin seems to be broken, mainly due to the input.systems module; I have no suddenly two log files, one in /var/log/telegraf/telegraf.log and one in /var/log/telegraf.log. Although the config says
[global_tags]
[agent]
interval = "10s"
round_interval = false
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = true
logfile = "/var/log/telegraf.log"
hostname = "opnsense"
omit_hostname = false
[[outputs.influxdb]]
urls = ["http://192.168.1.205:8086"]
database = "telegraf"
retention_policy = ""
write_consistency = "any"
timeout = "5s"
username = "influx"
password = "XXXXXXXXXX"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
[[inputs.disk]]
mount_points = ["/"]
[[inputs.diskio]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.system]]
[[inputs.net]]
that /var/log/telegraf.log shall be used, it uses the other one and writes tons of messages like
2018-09-28T19:20:23Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:20:33Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:20:43Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:20:53Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:21:03Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:21:13Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:21:23Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:21:33Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:21:43Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:21:53Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
2018-09-28T19:22:03Z E! Error in plugin [inputs.system]: open /var/run/utmp: no such file or directory
E! Unable to append to /var/log/telegraf.log (open /var/log/telegraf.log: permission denied), using stderr
/var/log/telegraf.log belongs now root:root but should root:telegraf (?),
The non-nice side effect is that opnsense throughput in downlink drops to <5% of the normal performance.
Br br
Can you try to revert telegraf pkg (not the plugin) to the old version from 18.7.3?
Err - would love to but never done that before
Is it 'opnsense-revert -r 18.7.3 telegraf'
Br br
Think so, only mobile today
https://docs.opnsense.org/manual/opnsense_tools.html
Well, the utmp error message disappeared, however still showing the error message
E! Unable to append to /var/log/telegraf.log (open /var/log/telegraf.log: permission denied), using stderr
E! Unable to append to /var/log/telegraf.log (open /var/log/telegraf.log: permission denied), using stderr
changed /var/log/telegraf to telegraf:telegraf, even there no change. Also the problem persists that throughput drops dramatically. have now temporarily switched off telegraf.
Note: we use telegraf in a larger cloud environment and observed some times that when telegraf wants to access files but cannot due to permission, CPU load on this node rises to 100% and machine more or less stops productive work ...
However, could it be that with the recent kernel/sys upgrade, utmp was replaced by some more modern utx? At least the man page on my sense indicate so .... Would then need some adaption in telegraf too ....
Br br
OK - here we are with the issue:
There is a conflicting config in telegraf with the log files:
Via the GUI, telegraf is advised to write the log file /var/log/telegraf.log (see above), this is written to the telegraf config file
However, the start script for telegraf in /usr/local/etc/rc.d/telegraf configures:
(...)
name="telegraf"
rcvar=telegraf_enable
load_rc_config $name
: ${telegraf_enable:="NO"}
: ${telegraf_user:="telegraf"}
: ${telegraf_group:="telegraf"}
: ${telegraf_flags:="-quiet"}
: ${telegraf_conf:="/usr/local/etc/${name}.conf"}
: ${telegraf_options:="${telegraf_flags} -config=${telegraf_conf}"}
logfile="/var/log/telegraf/${name}.log"
pidfile="/var/run/${name}.pid"
command=/usr/sbin/daemon
start_precmd="telegraf_prestart"
start_cmd="telegraf_start"
stop_cmd="telegraf_stop"
(...)
this causes a conflict where to write ...
Only either or is possible ....
Solution would be to leave the log file entry in telegraf.conf
(...)
[agent]
interval = "10s"
round_interval = false
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = true
logfile = "" <--- leave empty
hostname = "opnsense"
omit_hostname = false
(...)
As I assume that this is again created automatically this requires change in the code
Br br
Thanks for all infos .. seems with the jump from telegraf 1.6.X to 1.7.X this was changed in rc script.
I'll try to fix this with the next version ..
Thanks, thats great!
Br br
Yup, will be in 18.7.5 next week...
Cheers,
Franco