Managing multiple puppet modules with modulesync

With the exception of children, puppies and medical compliance frameworks managing one of something is normally much easier than managing a lot of them. If you have a lot of puppet modules, and you’ll eventually always have a lot of puppet modules, you’ll get bitten by this and find yourself spending as much time managing supporting functionality as the puppet code itself.

Luckily you’re not the first person to have a horde of puppet modules that share a lot of common scaffolding. The fine people at Vox Pupuli had the same issue and maintain an excellent tool, modulesync that solves this very problem. With modulesync and a little YAML you’ll soon have a consistent, easy to iterate, on set of modules.

To get started with module sync you need three things, well four if you count the puppet module horde you want to manage.

I’ve been using modulesync for some of my projects for a while but we recently adopted it for the GDS Operations Puppet Modules so there’s now a full, but nascent, example we can look at. You can find all the modulesync code in our public repo.

First we set up the basic module sync config in modulesync.yml -

---
git_base: 'git@github.com:'
namespace: gds-operations
branch: modulesync
...
# vim: syntax=yaml

This YAML mostly controls how we interact with our upstream. git_base is the base of the URL to run git operations against. In our case we explicitly specify GitHub (which is also the default) but this is easy to change if you use bitbucket, gitlab or a local server. We treat namespace as the GitHub organisation modules are under. As we never push directly to master we specify a branch our changes should be pushed to for later processing as a pull request.

The second config file, managed_modules.yml, contains a list of all the modules we want to manage:

---
- puppet-aptly
- puppet-auditd
- puppet-goenv

By default modulesync will perform any operations against every module in this file. It’s possible to filter this down to specific modules but there’s only really value in doing that as a simple test. After all keeping the modules in sync is pretty core to the tools purpose.

The last thing to configure is a little more abstract. Any files you want to manage across the modules should be placed in the moduleroot directory and given a .erb extension. At the moment we’re treating all the files in this directory as basic, static, files modulesync does expand them provides a @configs hash, which contains any values you specify in the base config_defaults.yml file. These values can also be overridden with more specific values stored along side the module itself in the remote repository.

Once you’ve created the config files and added at least a basic file to moduleroot, a LICENSE file is often a safe place to start, you can run modulesync to see what will be changed. In this case I’m going to be working with the gds-operations/puppet_modulesync_config repo.

bundle install

# run the module sync against a single module and show potential changes
bundle exec msync update -f puppet-rbenv --noop

This command will filter the managed modules (using the -f flag to select them) clone the remote git repo(s), placing them under modules/, change the branch to either master or the one specified in modulesync.yml and then present a diff of changes from the expanded templates contained in moduleroot against the cloned remote repo. None of the changes are actually made thanks to the --noop flag. If you’re happy with the diff you can add a commit message (with -m message), remove --noop and then run the command again to push the amended branch.

bundle exec msync update -m "Add LICENSE file" -f puppet-rbenv

Once the branch is pushed you can review and create a pull request as usual.

Screen shot of GitHub pull request from modulesync change

We’re at a very early stage of adoption so there is a large swathe of functionality we’re not using so I’ve not mentioned. If you’re actually using the moduleroot templates as actual templates you can have a local override, in each remote module/github repo, that can localise the configuration and be correctly merged with the main configuration. This allows you to push settings out to where they’re needed while still keeping most modules baselined. You can also customise the syncing workflow to specify bumping the minor version, updating the CHANGELOG and a number of other helpful shortcuts provided by modulesync.

Once you get above half-a-dozen modules it’s a good time to take a step back and think about how you’re going to manage dependencies, versions, spec_helpers and such in an ongoing, iterative way and modulesync presents one very helpful possible solution.