Naive first steps with Terraform

Naive First Steps with Terraform

On one of the $WORK projects, we’ve recently had a chance to join seemingly the entire AWS using world and spend some time using Terraform to manage a few migration prototypes. I’ve had a few little plays with Terraform over the last few years but I’ve never tried to plan a large environment with it before and even though it’s -very- early days for me it’s been an interesting path of discovery.

We initially started out with a directory of tf files that each managed groups of resources and shared a state file. This was fine for getting to grips with basic commands resource writing but once we had a few dozen resources I started to get nervous about every terraform apply. The two things that worried me were firstly that every apply could potentially change every part of the system; even if that part of the code base hadn’t been updated. While this should never really be a problem we’ve seen enough issues that it was still playing on my mind.

The second concern was the terraform statefile. Although we were storing it in S3 (and who stores something that important in Consul?) it’s a risk that if any resource was ever written in a corrupted state we’d essentially lose everything in one sweep. As an aside one of my biggest wants for Terraform is a ‘discovery’ mode so we can kill the state file off. The importance of the state file was hammered home when we tried to refactor resources defined in standalone .tf files to be inside modules. This turned out to be a less than fun experience of rewriting JSON using vim and fervently hoping that the plan would eventually look like it did when we started.

After we’d come out of our initial experiments with a positive attitude and new found appreciation for Terraforms remarkably comprehensive support of AWS resources it was time to take a second look and see how we’d deal with a much larger, more complicated environment. Our current prototype, built with about 8 days of experience, and I stress that this is an experiment which might have major limitations, has a simple four top level concepts layout. We’ve also, like everyone else, written a wrapper script for gluing all this together and running terraform in a consistent way.

Our four basic ideas are that ‘projects’, which we’re treating as isolated groups of resources, should be self contained in both the code they consist of and the state file that represents them. This separation makes it easier to reason about possible changes, and limits the damage radius if something goes wrong. The project directory layout currently looks like this:

projects/                                         # each project has a directory under here
projects/change_management/
projects/change_management/README.md
projects/change_management/resources               # common resources should be placed here
projects/change_management/resources/main.tf
projects/change_management/resources/rds.tf
projects/change_management/resources/production/   # environment specific resources should be placed here
projects/change_management/resources/production/read-replicas.tf

We also have resources that only need to exist in a single environment. Sometimes in addition to other resources that should exist everywhere. We’re implementing that functionality by separating those resources out to a subdirectory. Having a database in all environments but only having read replicas in production and staging, to manage costs, is an example of this. While coming up with this one thing bit us that people familiar with terraform have probably already spotted - terraform only takes a single directory of files to run. We work around this in our wrapper script by merging directories contents together in a temporary directory and running that. It’s not quite the same work flow as everyone else is using but it gives us the structured layout we wanted in a few lines of code. It also provides a nice central point to add anything else, such as template expansion, that we want to do before terraform gets to see the code.

Once we’re using a piece of code in a couple of different projects it becomes eligible to be refactored out and made in to a module to enable consistency and reuse. As we’re only building an experimental outline, we currently store all our modules within the same repository and use file sources. One example layout is:

modules/                             # each local module should be placed under here
modules/overlay_network/             # resource containing tf files should be placed here
modules/overlay_network/main.tf
modules/overlay_network/mpls.tf
modules/overlay_network/sctp.tf
modules/overlay_network/README.md

We don’t currently nest modules so this is a shallow structure. We’ve discussed using nested modules in a way similar to how you’d use Puppet classes and defines but we’ve not needed to probe that area yet. Thanks to John Vincents post, mentioned below, we’ve sneaked a glance of the future and that path seems to lead to lots of boilerplate if not done well.

The last two top level concepts in our experiments are similar and closely related. We have a base config file, and some more focused ones, for individual providers and such -

configs/
configs/common.tf               # values that should be available everywhere
configs/${environment_name}tf   # environment specific default regions and similar
configs/${provider_name}.tf     # provider specific default regions and similar

And then we have where we actually set the variables.

variables/
variables/common.tfvars
variables/${environment_name}.tfvars

Again we rely on our wrapper script to do the right thing and gather up all the relevant ones, create the correct terraform command line with -var-file options and such. If we continue down this route it’s actually quite tempting to write a Hiera provider to keep things consistent between the tools we use.

We’re not running this in anger, at scale or over half-a-dozen different providers. It’s currently a learning exercise and prototype that’s helped us get to grips with terraforms strengths and weaknesses and it’s a very simple, easy to extend initial prototype to see if we’re going in the right direction or not.

It does have limitations we’ve not yet needed to address. Our use of four main, predefined, environments means it doesn’t currently provide the ability to run arbitrary stacks side by side. Two of us can’t run independent versions of the same database tiers at the moment. It also doesn’t take into account how to access values from other projects. It has shown us some possible limitations of terraform itself though and where we need to do more discovery. Moving code with existing, deployed, resources to modules for example taught us all how to hate a little more.

Hopefully this short look at an early stage experiment in Terraform has been helpful. In that spirit if you’re interested in other peoples experiences with Terraform there are two -must- read posts:

The initial layout covered here was a great team bonding opportunity and I had a lot of fun working with Alex Muller and Laura Martin on it. There’s something quite refreshing about everyone suddenly stopping, saying “Does it do that?” and opening a load of new tabs.