We are evaluating OPNsense 17.1.2 for production use and observed that the snmp service is crashing regularly after about 5 days. We are sending snmp requests every five minutes for the most common linux parameters (cpu util, memory usage, interface usage, uptime) using bare OIDs, so this can't be a MIB issue.
One other thing that bothers me is that we don't get any snmp response from the standard Nagios check_snmp_storage request. All Linux hosts of different flavors give a valid reply for this SNMP query, only OPNsense does not. We could live with the fact that we can't monitor disk usage on OPNsense boxes but it is inconvenent that we have to treat these boxes seperatly from all other linux hosts.
Looks like the snmp service is dying because it's logfile growth, but this should not happen IMO. SNMPD is notoriously talkative and a quick and dirty solution would be to set the dontLogTCPWrappersConnects option. Of course this would not solve the underlying problem that it must not be possible to kill the service by sending ligitimate snmp get requests.
We experience the same problem on an OPNsense HA pair. SNMP is crashing every couple of days on both nodes. Graphs show memory spike, then bsnmp daemon gets killed. Log shows:
pid 18770 (bsnmpd), uid 0, was killed: out of swap space
We already upped memory from 1GB to 2GB, but this does not help. There is no swap space avaialable:
root@opnsense:~ # swapinfo
Device 1K-blocks Used Avail Capacity
Any advice?
The SNMP service stopped crashing here after increasing memory from 2 GB to 3 GB
Has this to do with an action like a cron job?
This might indeed be the case. We noted that all snmp crashes were preceded by an sudden increase in memory usage. We also noticed that there was no swap partition even though the disk was created with 1GB free space. Manually creating a swap partition should give some additional memory headroom against snmp crashes.
Quote from: fabian on April 08, 2017, 08:57:09 PM
Has this to do with an action like a cron job?
We have no custom cron jobs configured. bsnmp is killed at different times (00:10, 07:00...).
Ok, that could be a problem to find the issue then. My first thought was a proxy blacklist or an IDS signature update using the full memory is responsible for taking a large amount of memory.
Quote from: fabian on April 10, 2017, 09:12:32 AM
Ok, that could be a problem to find the issue then. My first thought was a proxy blacklist or an IDS signature update using the full memory is responsible for taking a large amount of memory.
We don't use any blacklists or IDS functionality.