After adding AlertManager to my Prometheus test stack in a previous post I spent some time triggering different failiure cases and generating test messages. While it’s slightly satisfying seeing rows change from green to red I soon wanted to actually send real alerts, with all their values somewhere I could easily view. My criteria were: must be easy to integrate with AlertManager must not require external network access must be easy to use from docker-compose should have as few moving parts as possible A few short web searches later I stumbled back onto a small server I’ve used for this in the past - MailHog. Read on →

What’s the use of monitoring if you can’t raise alerts? It’s half a solution at best and now I have basic monitoring working, as discussed in Prometheus experiments with docker-compose, it felt like it was time to add AlertManager, Prometheus often used partner in crime, so I can investigate raising, handling and resolving alerts. Unfortunately this turned out to be a lot harder than ‘just’ adding a basic exporter. Before we delve into the issues and how I worked around them in my implementation let’s see the result of all the work, adding a redis alert and forcing it to trigger. Read on →

How much of your system does your internal monitoring need to consider down before something is user visible? While there will always be the perfect chain of three or four things that can cripple a chunk of you customer visible infrastructure there are often a lot of low importance checks that will flare up and consume time and attention. But what’s the ratio? As a small thought experiment on one project I’ve recently started to leave a new, very simple four panel, Grafana dashboard open on a Raspberry PI driven monitor that shows the percentage of the internal monitoring checks that are currently in a successful state next to the number of user visible issues and incidents. Read on →

As 2018 rolls along the time has come to rebuild parts of my homelab again. This time I’m looking at my monitoring and metrics setup, which is based on sensu and graphite, and planning some experiments and evaluations using Prometheus. In this post I’ll show how I’m setting up my tests and provide the Prometheus experiments with docker-compose source code in case it makes your own experiments a little easier to run. Read on →

It’s time for a little 2017 navel gazing. Prepare for a little self-congratulation and a touch of gushing. You’ve been warned. In general my 2017 was a decent one in terms of tech. I was fortunate to be presented a number of opportunities to get involved in projects and chat to people that I’m immensely thankful for and I’m going to mention some of them here to remind myself how lucky you can be. Read on →

As your terraform code grows in both size and complexity you should invest in tests and other ways to ensure everything is doing exactly what you intended. Although there are existing ways to exercise parts of your code I think Terraform is currently missing an important part of testing functionality, and I hope by the end of this post you’ll agree. I want puppet catalog compile testing in terraform Our current terraform testing process looks a lot like this: precommit hooks to ensure the code is formatted and valid before it’s checked in run terraform plan and apply to ensure the code actually works execute a sparse collection of AWSSpec / InSpec tests against the created resources Visually check the AWS Console to ensure everything “looks correct” We ensure the code is all syntactically validate (and pretty) before it’s checked in. Read on →

While trying to add additional performance annotations to one of my side projects I recently stumbled over the exceptionally promising Server-Timing HTTP header and specification. It’s a simple way to add semi-structured values describing aspects of the response generation and how long they each took. These can then be processed and displayed in your normal web development tools. In this post I’ll show a simplified example, using Flask, to add timings to a single page response and display them using Google Chrome developer tools. Read on →

Like most people I have too many credentials in my life. Passwords, passphrases and key files seem to grow in number almost without bound. So, in an act of laziness, I decided to try and remove one of them. In this case it’s my AWS EC2 SSH key and instead reuse my GitHub public key when setting up my base AWS infrastructure. Once you start using EC2 on Amazon Web Services you’ll need to create, or supply an existing, SSH key pair to allow you to log in to the Linux hosts. Read on →

I’ve been a fan of Yelps pre-commit git hook manager ever since I started using it to Prevent AWS credential leaks. After a recent near miss involving a push to master I decided to take another look and see if it could provide a safety net that would only allow commits on non-master branches. It turns out it can, and it’s actually quite simple to enable if you follow the instructions below. Read on →

A few months ago while stunningly bored I decided, in a massive fit of hubris, that I was going to write and publish a technical book. I wrote a pile of notes and todo items and after a good nights sleep decided it’d be a lot more work than I had time for. So I decided to repurpose Puppet CookBook and try going through the publication process with that instead. But (disclaimer) with a different title as there is already an excellent real book called Puppet Cookbook that goes in to a lot more depth than my site does. Read on →