CloudFormation, S3 buckets and IAM Roles - UnixDaemon: In search of (a) life

We’re currently moving some of our early stage dev prototypes to a more automated environment and as part of this work I’m converting command line AWS resource creation to parameterised CloudFormation templates that we can use to either run multiple stacks side by side or recreate the entire stack from development to production. It’s been quite a frustrating afternoon due to some tool chain related yak shaving and some nuances in how CloudFormation works.

So we wrote the CloudFormation template, a lovely collection of AWS::IAM::Role, AWS::IAM::Policy and AWS::IAM::InstanceProfile resource types, ran it and verified that the completed stack had created the policies we expected. Although we may have had a ‘few’ more cycles in there than that to get it right. We span up a new instance with the role to confirm the access was correct only to find that the S3 bucket we created didn’t exist. Well, it did but the name wasn’t what we expected.

CloudFormation has an occasionally helpful feature that modifies resource names so that you can run multiple versions of the template side by side without the resources conflicting. Unfortunately this was an issue for us as our policies, and instances, were pointing to a bucket that didn’t exist. While you can reference certain parameters of a resource from another resource we didn’t want to change all the existing tooling to suit this one issue. So in short you can’t create an absolute S3 bucket name from a CloudFormation template. We added another step to our wrapper script (yah for boto) and moved on. It’s worth noting that CloudFormation doesn’t do this ‘localisation’ for Route53 records so you’ll need to name space those yourself.

Being good operations people we removed the bucket we’d created while debugging, deleted the CloudFormation stack and modified the template to remove the S3 bucket resources. We then did another full stack creation (it’d be dishonest to call this only the second one) and everything was created. Changing windows back to the EC2 instance we were using for testing we ran the upload commands and… ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records. Now I’m not an expert on IAM so after checking that the policies existed and looked correct and that the bucket was added by the wrapper script I was a little stumped.

At first I thought I’d made a mistake with my repackaging of s3cmd so I ran it under debug to ensure all the IAM role details were being picked up and used:

$ s3cmd put /var/log/dmesg s3://net.unixdaemon.blog-false-location/ --dump-config | egrep -i 'token|secret|key'

access_key = KJH45KH345JKHH
access_token = SFSFSFLJSFLJ ... SNIP ... FSFFSDFSDFSDF98S0D98FS
secret_key = KJH45KH345JKHHKJH45KH345JKHHKJH45KH345JKHH

# still broken
$ s3cmd put /var/log/dmesg s3://net.unixdaemon.blog-false-location/
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.

Everything seemed to be fine with the S3 commands so I wanted to double check the instance was doing what I expected it to. Every running EC2 instance has something called ‘instance meta-data’ associated with it. This is data about your EC2 instance that you can use to configure or manage the running instance. One of the subkeys contain information about IAM so I checked that the role was still present and correct.

# this url is the same on every AWS instance - Windows or Linux
$ curl http://169.254.169.254/latest/meta-data/iam/security-credentials/

cf-iam-trial-myspecialrole-1N34HHJ23463XR1OBK7HD

At a glance the assigned role looked correct but after some more double checking I realised that I’d overlooked how CloudFormation works. When we’d deleted the stack and recreated it the resources had been recreated but with different unique identifiers - the awkward string at the end that I’d just assumed was correct. I went back to the IAM tab in the web console and confirmed that the instance was still using the value from before we reapplied the templates. A quick meta-data lookup verified this -

$ curl http://169.254.169.254/latest/meta-data/iam/info | grep Last
  "LastUpdated" : "2013-08-30T14:00:42Z",

The meta-data is set at instance creation time and is not (apart from a very small selection of dynamic data) updated during the life of the instance. Even by a reboot (yes, we tried that too). While it’s nice to know what actually happened and how to debug these kind of issues in the future this does mean we’ll need to spend some more time thinking about if we use this approach. The development environments spin though instances quickly enough that changing the associated IAM templates wouldn’t cause a major issue but the staging and production environments have longer running instances that could suddenly find their access revoked if we had to update the template. We’re going to have to make the choice about consistency in using CloudFormation for this vs using another tool that will keep the resource names the same to avoid surprised. So much for consolidating the tool chain.

While on this journey I also hit -

The version of s3cmd that comes with certain distros predates EC2 instance IAM roles and so downloads from S3 don't work. This is fixed in modern versions and thanks to FPM packaging a new version was a quick task.
The list of allowed characters for S3 bucket names differs between the web interface and CloudFormation. The CloudFormation template validation requires resource names to be alphanumeric while the web interface doesn't.