When it comes to the list of problems ‘our uptimes are too high’ isn’t normally in the top five that sysadmins dread. While having a lengthy uptime used to be a boasting point it can also hide technical issues - such as kernel upgrades you’ve applied but not enabled (unless you’re running something special like ksplice), confidence gaps in high availability systems (when was the last time you did a fail over?) and a general worry that what’s running on a host now may not be when it comes back up. Read on →

I recently headed up to the August NWRug in Manchester, firstly because it’s been a while since I’ve seen Will Jessop, the organiser (and more importantly a mate) and secondly because I was interested in Capistrano. While we use puppet at work for the more strategic stuff, such as ensuring machines start off with a well-defined configuration, I’ve been in need of something to perform sets of tasks against defined groups of servers. Read on →

If you’ve not seen or read much in the way of Scifi then Moon may be an innovative movie that surprises you with its plot twists, (a film with a plot? Quick, change screens, Transformers is on next door) surprises and human/machine interaction. If you’re reading this blog then I’m guessing you’ll find it slow, predictable and a bit meandering. 5⁄10. Very average.

We recently consolidated a number of websites used by one of our brands back down to a sensible number (sensible being one). Which, while only a single action point on an email, turned out to be a large amount of DNS and apache vhost wrangling. In order to give myself a safety net, and an easy to check todo list, I decided to invest ten minutes in writing a small test script. Read on →

Thanks to the enthusiasm I’ve returned from EuroPython with (and the fact I couldn’t make it to OpenTech because of washing machine issues) I decided to spend a little bit of time porting Nagios Webchecks to python. As a use case it covers a lot of the functionality I need in my day to day system scripts. The ability to specify command line arguments, read a config file and interpolate a template file for output. Read on →

At work we both build our own packages and use puppet to manage our servers. While the developers package up their work in the systems team we’ve moved more to deploying programs and their dependencies via Puppet. While it seems easier, and quicker, to do the pushing that way, at least for scripts, you lose the ability to track what’s responsible for putting each file on the system. I’m probably already modelling the more complex parts of what would be in a package (such as services and cronjobs) in the module and thanks to Puppet I’m probably doing it in quite a portable way. Read on →

Over the last week I’ve been up in Birmingham catching up with some old friends and attending some talks at the little get together of around 450 Pythonistas that was EuroPython 2009. This was my second Python conference. The first was PyCon 2008, which was so well organised (by many of the same team as this years EuroPython) that I was inspired to come back. And I wasn’t disappointed. There were a lot of very good talks, some that have planted seeds that I’ll have to come back and try to find the time to look at and some that showed me things I plan on using in the very near future (such as py.test). Read on →

When it comes to Unix diagnostics I was raised the old fashion way, with iostat, vmstat and similar tools. However times change and tools evolve. dstat, while not as comprehensive as using all the tools one by one, provides a wide range of system performance details in an easy to use package. While it’s useful enough in its default state there is even more functionality lurking just below the surface. To see which other modules are available (but are not enabled by default) run dstat -M list. Read on →

Nagios has a wonderful ‘duration’ column in its web interface that’s always bemused me. At what point does a check being in a warning, or even worse, a critical state stop being a problem worthy of head space and start being normal operating procedure? Checks can stay in an extended broken state for many reasons but they all seem to be symptoms of a larger problem. If it’s a small thing then are you getting enough time to do housekeeping? Read on →

Is it just me or does everybody seem to go and buy a new laptop just before they leave their current job? Is it the techie version of buying new work shoes?