Puppet External Resource - a Hidden gem - UnixDaemon: In search of (a) life

"a simple resource that blocks transactions until a check passes, theoretically indicating that a remote resource is in a desired state.“
– Puppet Remote Resource Documentation

I stumbled over the Puppet Remote Resource module while looking around the Puppetlabs github account for something completely different and was surprised to find that I’d never seen this little gem mentioned anywhere else. A pre-built way to have a puppet resource skipped based on the result of an external command is a very flexible tool, especially when you couple it with all the available nagios checks.

Nagios checks have a wonderfully well known set of exit codes, in essence they return 0 if the check succeeds. Anything else is either a warning, a critical issue or an unhandled circumstance. In our case we only want puppet to apply our resource if everything is fine so this is ideal as external_resource treats anything but 0 as a reason to skip the resource.

First we install the nagios check -

  yum install nagios-plugins-tcp

  # to list the available plugins
  yum search nagios-plugins

then we grab a copy of the module for testing and add our small test script -

cd /tmp
git clone https://github.com/deanwilson/puppetlabs-external_resource.git external_resource

cat << 'EOC' > external_example.pp
class external_example(
  $tcp_check = '/usr/lib64/nagios/plugins/check_tcp -H 127.0.0.1 -p 8088 -t 30'
){

  external_resource { 'web server check':
    frequency => 1,
    timeout   => 3,
    check     => $tcp_check,
  }

  notify { 'Remote server check':
    require => External_resource['web server check'],
  }

  notify { 'i still run': }
}

include external_example

EOC

Our example has three resources, firstly the external_resource that controls what to check, how often to check for it and how long before it should be timed out. This timeout helps avoid long running but essentially stalled puppet runs if the remote resource is unavailable due to network issues. The second resource is a notify that requires external_resource and the last is a notify to show that only the ‘Remote server check’ notify is skipped, not the whole puppet run.

We then run the example, using the ‘manifest’ ordering ( excellently explained here)

  puppet apply --modulepath /tmp --ordering manifest -v external_example.pp

Info: /Stage[main]/External_example/External_resource[web server check]: Remote resource is not up; delaying for 1 seconds before next check
Info: /Stage[main]/External_example/External_resource[web server check]: Remote resource is not up; delaying for 1 seconds before next check
Info: /Stage[main]/External_example/External_resource[web server check]: Remote resource is not up; delaying for 1 seconds before next check

Error: /Stage[main]/External_example/External_resource[web server check]: Could not evaluate: Remote resource not up within timeout 3

Notice: /Stage[main]/External_example/Notify[Remote server check]: Dependency External_resource[web server check] has failures: true
Warning: /Stage[main]/External_example/Notify[Remote server check]: Skipping because of failed dependencies

Notice: i still run
Notice: /Stage[main]/External_example/Notify[i still run]/message: defined 'message' as 'i still run'

The output shows the failure of the external_resource and the consequent skipping of the ‘Remote server check’ notify. Now that we’ve seen the module in action and know how to use it let’s make our scenario a little more complicated. Imagine we have a load balancer that has one config file per back end, we don’t want puppet to add the new config until the back end node is up and on the network. We can add the new node config to puppet at any point but because the check_tcp plugin fails it won’t add that resource to the loadbalancer.

This approach can easily be adapted to most network services. MySQL databases (don’t restart the appserver on a config change if the remote mysql isn’t present). Only deploy certain resources if a DNS entry resolves (check_dig) or the host has enough diskspace (check_disk). My current favourite is not to change any application resources on a host until the nagios check confirms the host’s been cleanly removed from the load balancer and traffic has drained off (and then you can use a post_command to add it back in!)

I’d also like to say a quick thank you to Hunter from Puppetlabs who very quickly turned around a patch for this module.