A while ago @ripienaar and I had a chat in a pub about monitoring, event systems and lots of related subjects. As we all know he’s way more productive than is fair and so while he’s been doing a BUNDLE of work with on subjects like monitoring frameworks and event correlation I’ve been doing some thinking (and no actual coding) about event auditing, continuous compliance and security event management. Now I’ve finished the $TIMESINK_PROJECT I’m soon going to actually need some of this stuff so I’ve started putting together a prototype framework that I’m calling DSAC - Dump Send and Correlate. Read on →

<tl;dr> Search for puppet resources values using puppet, not just plain text</tl;dr> One of the ideas that has been sitting on my todo list is having a command that lets me grep a puppet manifest for certain properties, values or even just resources in a smarter way than just running a raw grep over files. While a simple grep works in some cases it is annoyingly fragile when you’re trying to ignore literal strings in resource types that you’re not interested in or narrow your search down to resources that have a property that can also appear in other types. Read on →

While most people know you can use puppet to ensure a service is running the mechanism it uses to determine if a service is actually running is often unexplored. By default (at least up to Puppet 2.6) puppet assumes that a service doesn’t supply a working status option and so will look up the services name in the process table to check if it’s running. If your service does support the status argument you can set ‘hasstatus => true’ and the platforms service provider will be used to interrogate the services current status. Read on →

It’s been years since I’ve read a book on VMWare. Between the maturity and ease of use of their GUI tools and my own continual move towards Free virtualisation I’ve not had the professional need or the spare time to invest but when a book comes as highly recommended as the VMware vSphere 4.1 HA and DRS Technical deepdive does you have to make some room on your (virtual) bookshelf. Despite its small page count this book covers its subject material in a simple, direct and technically clear way. Read on →

I’ve been doing a little tinkering with pre/post release checklists and compliance reporting using cucumber and some Nagios wrapping (among other things) in my test lab and recently needed to do some higher level entire environment checks before moving on to the next step. While it’s possible to wrap something like nmaps ping check and then Nagios each target it does feel like stepping back a few years in the tool chain. Read on →

I never thought I’d use a cliche like “David vs Goliath” but considering the two speakers at London Devops it does seem a little apt. Andrew Godwin from ep.io, a Python hosting platform, was the first speaker, and he did an excellent job of explaining their internal platform, how they make their decisions and what makes them special. While it was both an interesting and engaging talk it did leave me a little worried about the size of the operation. Read on →

Last year at one of the many Belgium tech events Kris mentioned a conference called LOAD (2010) to me. I was a little late in booking the hotel and in the end I couldn’t make it over - and judging by the quality of this years event that was a big mistake. While it’s nice to spend time in the devops world and talk about communication, processes and how to merge development and operational tool-chains sometimes it’s nice to focus on solid, production grade sysadmining; and LOAD was the perfect conference for it. Read on →

Our source code has always been air gapped from the Internet. The forensic examination confirmed that software development servers and workstations were not affected by the incident – from HBGary Anyone else find it hard to accept that none of the developers, testers, documentation writers or build people ever accessed source code from their Internet connected laptops / workstations? Especially considering the state of their other security measures. Don’t get me wrong, in some cases it’s a sensible solution ( off-line key signing for example) but for entire teams working on a shared code base?

Sometimes it’s the little niggles that annoy people the most. As my team progress in to puppet they have an annoying habit of asking very good questions; which can sometimes be a struggle to answer. Todays best question was - “How do I tell if this file is under puppets control?” While there are a couple of different ways to check (grepping through your git checkout or modifying the file and running puppet were the immediate winners) the best way is probably to look inside the catalog and check against the title of the File resources it contains. Read on →

<tl;dr>Log nrpe-runner state changes when puppet runs to see what broke or was fixed.</tl;dr> While people most often use puppet to configure and repair their infrastructures sometimes they also inadvertently use it to damage and cripple them. As part of my attempt to reduce the mean time to spot a mistake across my systems I’ve come up with a handful of small scripts that let me wrap a puppet run in a Nagios NRPE powered safety net. Read on →