Introducing NRPE Runner

It might be a sign that I spend too much time online but the quicker a system gives me feedback the more useful I find it. While I love knowing my Nagios safety net has me covered when making changes sometimes waiting for that cgi to refresh can take too long, especially if I’m taking a iterative / test driven approach to the changes I’m making. For those use cases I wrote nrpe-runner.

The way I typically use Nagios is to have the Nagios server run the checks on the remote host via the NRPE plugin. The checks to be run on the host are normally stored in a config file with each entry looking like this:

command[local_mail]=/usr/local/libexec/nagios/check_local_mail

While this allows you to run each check to confirm that it’s still OK I wanted the ability to run all the commands in the file at once, which I can now do with nrpe-runner. If every thing’s fine then it exits silently, to confirm that it’s actually run I can summarise and even filter the checks to run:

# show everything as it's run whatever the return status
/usr/local/sbin/nrpe-runner -a
check_swap => SWAP OK - 100% free (16041 MB out of 16041 MB) |swap=16041MB;12031;9624;0;16041
... snipped ...
freemem => OK: 12% (1732M) free memory.

# show a summary
$ /usr/local/sbin/nrpe-runner -s
Ran 39 checks - OK 39. WARN 0, CRIT 0, UNKNOWN 0

# run any checks with ntp in the name (the part between [])
$ /usr/local/sbin/nrpe-runner -s -n ntp
Ran 3 checks - OK 3. WARN 0, CRIT 0, UNKNOWN 0

# run all process checks (checks the command after the '=')
$ /usr/local/sbin/nrpe-runner -s -c proc
Ran 17 checks - OK 17. WARN 0, CRIT 0, UNKNOWN 0

# show all checks named ntp
$ /usr/local/sbin/nrpe-runner -a -n ntp
ntp_skew_primary => NTP OK: Offset -0.003149271011 secs|offset=-0.003149s;5.000000;9.000000;
ntp_process => PROCS OK: 1 process with command name 'ntpd', args '-u ntp:ntp'
ntp_skew_secondary => NTP OK: Offset -0.002887368202 secs|offset=-0.002887s;5.000000;9.000000;

nrpe-runner also has the option to dump the results as json, which I’ll be exploring a little further in my next couple of blog posts. While it’s not exactly the same as having the checks run by nagios (the user and environment are often different) I’ve found that shortening the interval between running puppet or yum and seeing the nagios feedback has helped my work-flow quite a lot when making exploratory system changes - and even more when nothing should have changed but does…