Managing CloudFormation Stacks With Cumulus - UnixDaemon: In search of (a) life

Working with multiple, related CloudFormation stacks can become quite taxing if you only use the native AWS command line tools. Commands start off gently -

cfn-create-stack dwilson-megavpc-sns-emails --parameters "AutoScaleSNSTopic=testy@example.org" \ 
  --template-file location/sns-email-topic.json

but they quickly become painful. The two commands below each create stacks that depend on values from resources that have been defined in a previous stack. You can spot these values by their unfriendly appearance, such as ‘rtb-9n0tr34lac55’ and ‘subnet-e4n0tr34la’.

# Add the bastion hosts template

cfn-create-stack dwilson-megavpc-bastionhosts --parameters \ 
"
VPC=vpc-n0tr34l;BastionServerSNSArn=arn:aws:sns:us-east-1:14204989:fooo;
PrivateNATRouteTableAZ1=rtb-9n0tr34lac55;PrivateNATRouteTableAZ2=rtb-6n0tr34l5e0a;
PublicSubnetAZ1=subnet-3n0tr34l5c;PublicSubnetAZ2=subnet-e4n0tr34la;
KeyName=dwilson; \
" --template-file bastion.json


# create the web/app servers
cfn-create-stack dwilson-megavpc-webapps --parameters \
"
VPC=vpc-65n0tr34lb;BastionserverSG=sg-n0tr34l;PrivateNATRouteTableAZ1=rtb-n0tr34l;
PrivateNATRouteTableAZ2=rtb-n0tr34l;PublicSubnetAZ1=subnet-n0tr34l;
PublicSubnetAZ2=subnet-n0tr34l;KeyName=dwilson;WebServerSNSArn=arn:aws:sns:us-east-1:14:fooo
" --template-file location/webapps.json

When building a large, multi-tier VPC you’ll often find yourself needing to extract output values from existing stacks and pass them in as parameters to dependent stacks. This results in a lot of repeated literal strings and boilerplate in your commands and will soon cause you to start doubting your approach.

The real pain came for us when we started adding extra availability zones for resilience. A couple of my co-workers were keeping their stuff running with bash and python + boto but the code bases were starting to get a little creaky and complicated and this seemed like a problem that should have already been solved in a nice, declarative way.

It was about the point when we decided to add an extra subnet to a number of tiers that I caved and went trawling through github for somebody else’s solution. After some investigation I settled on Cumulus as the first project to experiment with as a replacement for our ever growing, hand hacked, creation scripts. To pay Cumulus the proper respect it did make life a lot easier at first.

The code snippets below show an example set of stacks that were converted over from raw command lines like the above to Cumulus yaml based configs. First up we have the base declaration and a simple stack definition.

locdsw:
  region: eu-west-1

  stacks:
    sns-email-topic:
      cf_template: sns-email-topic.json
      depends:
      params:
        AutoScaleSNSTopic:
          value: testymctest@example.org

Each of the keys under ‘stacks:’ will be created as a separate CloudFormation stack by cumulus. Their names will be prefixed with ‘locdsw’, taken from the first line of our example, and they’ll be placed inside the ‘eu-west-1’ region. The configuration above will result in the creation of a stack called ‘locdsw-sns-email-topic’ appearing in the CloudFormation dashboard

The stacks resources are defined in the template specified in cf_template. Our example does not depend on existing stacks and takes a single parameter, AutoScaleSNSTopic, with a value of ‘testymctest’. Cumulus has no support for variables so you’ll find yourself repeating certain parameters, like ami id and key id, throughout the configuration.

For a while we had an internal branch that treated the CloudFormation templates as jinja2 templates. This enabled us to remove large amounts of duplication inside individual templates. These changes were submitted upstream but one of the goals of the Cumulus project is that the templates it manages can still be used by the native CloudFormation tools, so the patch was (quite fairly) rejected.

Let’s move on to the second stack defined in our config. The point of interest here is the addition of an explicit dependency on the sns-email-topic stack. Note that it’s not referred to using the prefixed name, which can be a point of confusion for new users.

    security-groups:
      cf_template: security-groups.json
      depends:
        - sns-email-topic

Finally we move on to an example declaration of a larger stack. The interesting parts of which are in the params section.

    webapp:
      cf_template: webapp.json
      depends:
        - sns-email-topic
        - security-groups
      params:
        AppServerFleetSize:
          value: 1
        Owner:
          value: dwilson
        AMIId:
          value: ami-n0tr34l
        KeyName:
          value: dwilson
        ASGSNSArn:
          source: sns-email-topic
          type: output
          variable: EmailSNSTopicARN
        WebappSGID:
          source: security-groups
          type: output
          variable: WebappSGID

The webapp params section contains two different types of values. Simple ones we’ve seen before, ‘Owner’ and ‘AMIId’ for example, and composite ones that reference values that other stacks define as outputs. Let’s look at ASGSNSArn in a little more detail.

  ASGSNSArn:
    source: sns-email-topic
    type: output
    variable: EmailSNSTopicARN

Here, inside the webapp stack declaration, we look up a value defined in the output of the previously executed sns-email-topic template. From the CloudFormation Outputs for that template we retrieve the value of EmailSNSTopicARN. We then pass this to the webapp.json template as the ASGSNSArn parameter on stack creation. If you need to pull a parameter in from an existing stack that was created in some other way you can specify it as ‘source: -fullstackname’. The ‘-’ makes it an absolute name lookup, cumulus won’t prefix the stackname with locdsw for example.

Cumulus met a number of my stack management needs, and I’m still using it for older, longer lived stacks such as monitoring, but because of its narrow focus it began to feel restricting quite quickly. I’ve started to investigate Ansible as a possible replacement as it’s a more generic tool and I’m in need of flexibility that’d feel quite out of place in cumulus.

In terms of day to day operations the main issues we hit included the need to turn on ALL the debug, both cumulus and boto, to see why stack creations failed. A lot of the AWS returned errors were being caught and replaced by generic, unhelpful error messages at any filter level greater than debug. Running under debug results in a LOT of output, especially when boto is idle polling, waiting for the stack creation to complete so it can begin the next one. The lack of any variables or looping was also an early constraint. The answers to this seemed to include pushing the complexity down to the templates and writing large mapping sections, increasing duplication of literals between templates and a lot of FN::FindInMaps maps. The second approach was to have multiple configs. This was less than ideal due to the number of permutations, environment (dev, stage, live), region and in development which developer was using it. The third option, a small pre-processor that expanded embedded jinja2 in to a CloudFormation template, added another layer between writing and debugging and so didn’t last very long.

If you’re running a small number of simple templates then Cumulus might be the one tool you need. For us, Ansible seems to be a better fit, but more about that in the next post.