Subscribe: Dean Wilson@UnixDaemon: Whatever affects one directly, affects all indirectly.
Added By: Feedage Forager Feedage Grade A rated
Language: English
aws  code  group  new  project  puppet  resources  run  security group  security  simple  terraform  test  time  version 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Dean Wilson@UnixDaemon: Whatever affects one directly, affects all indirectly.

on UnixDaemon: In search of (a) life

My name is Dean Wilson, is my personal site where I store my code, writing, rantings and anything else I feel warrants sharing with the rest of the `Net.


Nicer Jenkins Views - Build Monitor Plugin

Sat, 08 Apr 2017 13:25:28 +0000

While migrating and upgrading an old install of Jenkins over to version 2 the topic of adding some new views came up in conversation and the quite shiny Jenkins CI Build Monitor Plugin came up as a pretty, and quick to deploy, option.

Using some canned test jobs we did a manual deploy of the plugin, configured a view on our testing machine, and I have to say it looks as good, and as easily readable from a few desks away, as we’d hoped.


The next step is to apply the true utility test, leave it in place for a week or so and then remove it and see if anyone notices. If they do we’ll add some puppet scaffolding and roll it out to all the environments.

Tales from the Script

Wed, 01 Mar 2017 19:54:46 GMT

A number of roles ago the operations and developer folk were blessed with a relatively inexperienced quality assurance department that were, to put it kindly, focused on manual exploratory testing. They were, to a person, incapable of writing any kind of automated testing, running third party tools or doing anything in a reproducible way. While we’ve all worked with people lacking in certain skills what makes this story one of my favourites is that none of us knew they couldn’t handle the work.

The manager of the QA team, someone I’d never have credited with the sheer audacity to pull off this long con, always came to our meetings with an earnest face and excuses about the failure of “The Script”. We, being insanely busy modern technical people, took this at face value; how would you run all the regression tests without a script? “There was a problem running the script”, “the newest changes to the script had caused regressions” and similar were always on the tip of their tongue and because the developers were under a lot of time pressure no deep investigations were done. Everyone was assumed to be doing their best and what a great QA manager they were in protecting their people from any fallout from the failures. On it went, all testing was done via “the script” and everything was again good. Or so we assumed.

In one of our recurring nightmare incident reviews, this one after something we’d previously covered had come back for the third time, a few of us began to get suspicious. We decided to build our own little response team and do some digging for the sake of every ones sanity. Now, this was before the days of GitHub and everyone being in one team of sharing and mutual bonding. We knew we’d have to go rooting around other departments infrastructure to see what was going on. Over the course of the next few days the group targeted one of the more helpless QA engineers and began to help him with everything technical he needed. He had the most amazing, fully hand held, on-boarding the department had ever seen and we, in little bits and pieces, began to pierce the veil of secrecy that was the QA teams process.

One day, just before lunch, one of the senior developers involved in our investigation hit the mother-load. The QA engineer had paired with them on adding testing to “the script” for a new feature the developer had written and suddenly he had a full understanding of the script and its architecture.

It was an Excel spreadsheet.

It was a massive, colour coded, interlinked Excel spreadsheet. Each row was a separate test path or page journey. Some rows were 40 fields of references to other rows to form one complete journey. Every time we did a release to staging they’d load up the Excel document from the network share and arrow key their way through row upon row of explicit instructions. Seeing it in use was like watching an insane cross between snake and minesweeper. Some of the cells were links to screen grabs of expected outputs and page layouts. Some of them had a red background to show steps that had recently failed. It was a horrific moment of shared confusion. A team of nearly forty testers had ended up building this monstrosity and running it for months. It was like opening up a word doc and having Cthulhu glare back at you. So we did the only thing we could think of, went to lunch and mostly sat in stunned silence.

And I almost forgot the best part of the story, the Excel spreadsheet? It was named “The_Script.old.xls”

Development Diaries and Today I Learned Repositories

Sun, 26 Feb 2017 15:14:34 GMT

One of the difficulties in technically mentoring juniors you don’t see on a near daily basis is ensuring the right level of challenge and learning. It’s surprisingly easy for someone to get blocked on a project or keep themselves too deep in their comfort zone and essentially halt their progress for extended periods of time. An approach I use to help avoid this stagnation is the keeping of a “Development Diary”. A development diary, which I’ve heard called by many other names, is simple in concept and can be just as easy to implement. It’s the commitment to write down something that you’ve learned in your role each and every day. Over time it becomes a collection of small wins and achievements and shows that even little learnings have a big cumulative impact. While the daily aspect isn’t essential, and I’ve had people on more “Business as usual” focused teams reduce the frequency to as low as once a week, I think that while you’re at a more junior point in your career it should be easier to find new things to take note of than the awkward part in the middle where you’re doing the same thing for the fifth company. One of the best diaries I’ve had the pleasure of reading was by a non-native English speaker and nested amongst the usual technical content was the occasional gem, an explanation of an idiom they’d heard and tried to work out before looking it up. The best technical implementation of a development diary I’ve seen recently was a simple git repo of directories. A single post, written in markdown, was added each day. It served as a great little git learning project for the junior (pull request reviews, branches and merging in to master) and provided an excellent single location to both look back over and see how much they’d learned over the last five months while providing them with starting material for their more formal quarterly reviews. Watching this diary grow as a mentor provides some helpful insights in to how the person is doing. You can see what they consider interesting, if their focus is narrow or ranges over entire parts of the work and, if nothing is added for longer periods of time, can provide an early warning. I’ve found that not having anything to add for longer periods of time is a red flag and highlights that something is happening that requires more attention. It could be the person’s either blocked at work, has something else on their mind or isn’t being challenged. None of which are ideal for anything beyond short periods of time. As an aside while I’ve had the most success using this in a one-to-one basis with juniors I’ve tried it a few times across an entire team when doing a discovery phase for a larger piece of work. I’ve found it a lot harder to achieve consistent buy-in in this environment. Especially when external contractors and companies are involved as it requires a fair amount of trust and honesty between all involved for it to be useful. When used in this way the willing involvement of the more senior technical people and how the juniors consider them seems to be the best indication of whether it’ll have a positive impact or not. In a more nurturing and sharing team the seniors are willing to show what they didn’t know without risk of losing credibility and the juniors are open to show their progress without being judged harshly. In a less functional team the juniors are very hesitant to risk embarrassing themselves and the seniors hate the idea of showing weakness or gaps in their skills. Something I’ve had a lot less success with is nearly the reverse of a “Today I Learned”. A place to collect the things you didn’t understand while working. I find it hard to determine how much material to record in this form and walk the line between having some guidance tasks for downtime and self-paced learning while not becoming a disheartening pile of things you[...]

Terraform Version Restrictions

Sat, 26 Nov 2016 19:42:10 GMT

One of my favourite forthcoming Terraform 0.8 features is the ability to restrict the versions of terraform a configuration file can be run by. Terraform is a rapidly moving project that constantly introduces new functionality and providers and unless you’re careful and read the change logs, and ensure everyone is running the same minor version (or you run terraform from a central point like Jenkins), you can easily find yourself getting large screens of errors from using a resource that’s in terraform master but not the version you’re running locally.

The new terraform configuration block allows you to avoid these kinds of issues by explicitly declaring which versions your code requires -

    $ cat resources/

    terraform {
        required_version = "> 0.8.3"
        # or specify a lower and upper bound
        # required_version = "> 0.7.0, < 0.8.0"

    $ ./terraform-8 plan resources

    The currently running version of Terraform doesn't meet the
    version requirements explicitly specified by the configuration.
    Please use the required version or update the configuration.
    Note that version requirements are usually set for a reason, so
    we recommend verifying with whoever set the version requirements
    prior to making any manual changes.

    Module: root
    Required version: > 0.8.3
    Current version: 0.8.0

While it’s not a shiny new piece of functionality I think this change will be greatly welcomed by terraform module authors. It shows a further step in maturity of both the tool and its emerging ecosystem and I can’t wait for it to become widely adopted.

Removing 'magic numbers' and times from your Puppet manifests

Thu, 06 Oct 2016 11:17:00 +0000

In a large Puppet code base you’ll eventually end up with a scattering of time based ‘magic numbers‘ such as cache expiry numbers, zone file ttls and recurring job schedules. You’ll typically find these dealt with in one of a few ways. The easiest is to ignore it and leave a hopefully guessable literal value (such as 3600). The other path often taken is the dreaded heavily linked and often missed comments that start off as 86400 # seconds in a day and over time become 3600 # seconds in a day.

The time_units puppet function is a proof of concept, written for a code base that suffers heavily from this, that makes these kinds of numbers more explicit and self-documenting. Once you’ve installed the module from the puppet-forge:

puppet module install deanwilson-time_units

and restarted your puppetmaster you can use it in your code with calls like these:

time_units(15, 'minutes', 'seconds') # returns 900

You can also make calls using a slightly ‘prettier’ 4 argument version.

time_units(2, 'hours', 'in', 'minutes') # returns 120

Is the added complexity of a function worth it when peer code review can ensure your code and comments are changed together? It all depends on what your manifests look like, how you review changes and how heavily you use hiera to bring values in. Hopefully you’ll never need this, but in the cases where you’re working on legacy spaghetti it’s sometimes nice to have little wins and cleanups.

Puppet Lint Plugins - 2.0 Upgrade and new repo

Sun, 21 Aug 2016 17:47:00 +0000

After the recent puppet-lint 2.0 release and the success of our puppet-lint 2.0 upgrade at work it felt like the right moment to claw some time back and update my own (11!) puppet-lint plugins to allow them to run on either puppet-lint 1 or 2. I’ve now completed this and pushed new versions of the gems to rubygems so if you’ve been waiting for version 2 compatible gems please feel free to test away.

Now I’ve realised exactly how many plugins I’ve ended up with I’ve created a new GitHub repo, unixdaemon-puppet-lint-plugins, that will serve as a nicer discovery point to all of my plugins and a basic introduction to what they do. It’s quite bare bones at the moment but it does present a nicer approach than clicking around my github profile looking for matching repo names.

Puppet Lint 2.0 Upgrade

Thu, 04 Aug 2016 11:47:00 +0000

With the recent puppet-lint 2.0 release it seemed a good time to bump the version we use at $WORK and see what’d changed. In theory it was as simple as changing the version in our Gemfile and ideally everything should continue as normal, but in practise it was a little more work than that and in this post I’m going to explain what we found. Firstly let’s cover a lovely, free, bonus. On our test codebase puppet-lint 1.0.1 took about 25 seconds to run on average. Immediately after the upgrade to 2.0.0 our run times dropped to around 19 seconds, with no changes required to our code. While it might seem like a tiny amount of time, considering how often the tests get run, we’ve probably already recouped the time spent performing the upgrade. In terms of lint warnings the first, and easiest to fix, complaint was the change from --no-80chars-check to --no-140chars-check. While most places already disable the line length check the Puppet style guide has become a little more accepting recently and now allows lines up to 140 characters. We have some longer lines, such as embedded ssh public keys, that hit this limit so we migrated from disabling the 80 character check to disabling the 140 one. This also required us to move from using the old config file .puppet-lintrc to the newer one .puppet-lint.rc. That was a few minutes of work so shouldn’t be a blocker for anyone. The second source of lint warnings from the upgrade came as a bit of a surprise. It seems that fat arrow (=>) alignment wasn’t being correctly checked on attributes that ended with a semi-colon. We had code that looked like this: file { '/tmp/fake': content => 'Hello', mode => '0644'; } That ran fine under puppet-lint 1.0.1 but raised issues under 2.0.0. Fixing it was easy enough file { '/tmp/fake': content => 'Hello', mode => '0644'; } and then the awkward, unneeded semi-colon was replaced with a sensible comma to make future diffs bit nicer too. file { '/tmp/fake': content => 'Hello', mode => '0644', } We fixed up nearly all the code violations with no manual intervention. We always work on branches so it safe enough to run bundle exec puppet- lint --fix . over the entire code base, let it change what it wanted, and then read through the diffs. While this completed 99% of the fixes it did raise one interesting --fix edge case / bug that I have on the TODO list to investigate: - warning => "@${minimum_request_rate * 1.2}", + warning => "@${minimum_request_rate} * 1.2", The fix code is a little aggressive in its protecting of variable names and in this case changes the functionality significantly by replacing a multiplication with a variable and a literal string. There is something to be said that this’d be better as an inline_template but for now changing it back was simple enough and the checks are happy with it. In closing, technically the upgrade is worth doing for the performance improvements and stricter linting. From a community side it’s nice to see more people involved and have versioned releases coming out rather than pinning to individual git hashes. A big ‘thank you’ is deserved by all the people involved. If you want to see exactly what was done you can see the full puppet-lint 2.0 upgrade in our public puppet code. [...]

Specialising validate_re with wrapper functions in Puppet

Tue, 28 Jun 2016 20:10:00 +0000

Once your puppet code base reaches a certain size you’ll often have a number of validate_ functions testing parameters and configuration values for compliance with local rules and requirements. These invocations often look like this: validate_re($private_gpg_key_fingerprint, '^[[:alnum:]]{40}$', 'Must supply full GPG fingerprint') Once you’ve spent a minute or two reading that you’ll probably be able to understand it; but wouldn’t it be nice to not have to care about the exact details and focus on what you’re actually testing? An approach I’ve been experimenting with on one larger code base is to specialise those kind of calls using a wrapper function. Instead of the detailed call above we can now do validate_gpg_key_fingerprint($finger_print). Which of those are easier to read? Which is simpler for people new to puppet? Implementing this kind of wrapper is much easier than you’d expect. All you need is a small stub function that wraps an existing one, in our case validate_re, and supplies some sensible defaults and local validation. You can see an example below. cat modules/wrapper_functions/lib/puppet/parser/functions/validate_gpg_key_fingerprint.rb module Puppet::Parser::Functions newfunction(:validate_gpg_key_fingerprint, :doc => <<-'ENDHEREDOC') do |args| A simple wrapper function to show specialisation of 'validate_re' from the stdlib and how it can make manifests more domain specific and easier to read. ENDHEREDOC unless (args.length >= 1 && args.length <= 2) message = 'wrong arguments - [error message]' raise ArgumentError, "validate_gpg_key_fingerprint(): #{message}" end fingerprint = args[0] # here we set the local rules and sensible defaults message = args[1] || 'Must supply full GPG fingerprint' regex = '^[[:alnum:]]{40}$' # here we run the original function function_validate_re( [ fingerprint, regex, message ] ) end end This is easy to test in isolation, a nice place to encompass more complicated validations while presenting a simple usage case and requires only a small amount of shim coding. If this approach interests you it’s also quite easy to achieve a similar benefit with custom wrappers to the is_ functions in stdlib that have a greater understanding of what you’re testing than the basic, but common cases, provided in things like stdlib . For example you can wrap is_ip_address and only return true for addresses in your own valid ranges. The most obvious downside to this approach, other than the custom coding required, is that if used often enough it’s easy to convince people that puppet has a number of basic validate_ and is_ functions that don’t actually exist anywhere outside of your repos. Although these kinds of changes are not going to revolutionise your code base they are nice, gradual improvements. If you have a number of users that don’t work with puppet on a regular basis, changing the vocabulary of your functions to closer align with your local domain terms can be a nicer way to ease them into reading, and eventually contributing, to the code. [...]

CloudFormation Linting with cfn-nag

Mon, 02 May 2016 17:46:00 +0000

Over the last 3 years I’ve done a lot of CloudFormation work and while it’s an easy enough technology to get to grips with the mass of JSON can become a bit of a blur when you’re doing code reviews. It’s always nice to get a second pair of eyes, especially an unflagging, automated set, that has insight in to some of the easily overlooked security issues you can accidentally add to your templates. cfn-nag is a ruby gem that attempts to sift through your code and present guidelines on a number of frequently misused, and omitted, resource properties. gem install cfn-nag Once the gem and its dependencies finish installing you can list all the rules it currently validates against. $ cfn_nag_rules ... IAM policy should not apply directly to users. Should be on group ... I found reading through the rules to be quite a nice context refresher. While there are a few I don’t agree with there are also some I wouldn’t have thought to single out in code review so it’s well worth having a read through the possible anti-patterns. Let’s check our code with cfn-nag. cfn_nag --input-json-path . # all .json files in the directory cfn_nag --input-json-path templates/buckets.json # single file check The default output from these runs looks like: ./templates/buckets.json ------------------------------------------------------------ | WARN | | Resources: ["AssetsBucketPolicy"] | | It appears that the S3 Bucket Policy allows s3:PutObject without server-side encryption Failures count: 0 Warnings count: 1 ./templates/elb.json ------------- | WARN | | Resources: ["ELB"] | | Elastic Load Balancer should have access logging configured Failures count: 0 Warnings count: 1 If you’d like to reprocess the issues in another part of your tooling / pipelining then the json output formatter might be more helpful. cfn_nag --input-json-path . --output-format json { "type": "WARN", "message": "Elastic Load Balancer should have access logging configured", "logical_resource_ids": [ "ELB" ], "violating_code": null } While the provided rules are useful it’s always a good idea to have an understanding of how easy a linting tool makes adding your own checks. In the case of cfn-nag there are two typed of rules. Some use JSON and jq and the others are pure ruby code. Let’s add a simple pure ruby rule to ensure all our security groups have descriptions. At the moment this requires you to drop code directly in to the gems contents but I imagine this will be fixed in the future. First we’ll create our own rule: # first we find where the gem installs its custom rules $ gem contents cfn-nag | grep custom_rules ./.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/cfn-nag-0.0.19/lib/custom_rules Then we’ll add a new rule to that directory touch $full_path/lib/custom_rules/security_group_missing_description.rb Our custom check looks like this - class SecurityGroupMissingDescription def rule_text 'Security group does not have a description' end def audit(cfn_model) logical_resource_ids = [] cfn_model.security_groups.each do |security_group| unless security_group.group_description logical_resource_ids << security_group.logical_resource_id end end if logical_resource_ids.size > 0 Violation::FAILING_VIOLATION, message: rule_text, logical_resource_ids: logical_resource_ids) else nil end end end The code above was heavily ‘borrowed’ from an existing check and a little bit of object exploration was done using pry. Once we have our new rule we need to plumb it [...]

Terraform Modules - My Sharing Wishlist

Sun, 01 May 2016 17:15:40 +0000

I’ve been writing a few Terraform modules recently with the aim of sharing them among a few different teams and there are a couple of things missing that I think would make reusable modules much more powerful.

The first and more generic issue is using the inability to use more complex data structures. After you’ve spent a while using Terraform with AWS resources you’ll develop the urge to just create a hash of tags and use it nearly everywhere. Hopefully with the ability to override a key / value or two when actually using the hash. If your teams are using tags, and you really should be, it’s very hard to write a reusable module if the tag names in use by each team are not identical. Because you can only (currently) pass strings around, and you’re unable to use a variable as a tag name, you’re stuck with requiring everyone to use exactly the same tag names or not providing any at all. There’s no middle ground available.

tags {
    "${}" = "Baz"

# creates a Tag called literally '${}'

My second current pain point, and the one I’m more likely to have missed a solution to, is the ability to conditionally add or remove resource attributes. The most recent time this has bitten me is when trying to generalise a module that uses Elastic Load Balancers. Sometimes you’ll want an ELB with a cert and sometimes you won’t. Using the current module system there’s no way to handle this case.

If I was to do the same kind of thing in CloudFormation I’d use the AWS::NoValue pseudo parameter.

    "DBSnapshotIdentifier" : {
        "Fn::If" : [
                {"Ref" : "DBSnapshotName"},
                {"Ref" : "AWS::NoValue"}

If DBSnapshotName has a value the DBSnapshotIdentifier property is present and set to that value. If it’s not defined then the property is not set on the resource.

As an aside, after chatting with @andrewoutloud, it’s probably worth noting that you can make entire resources optional using a count and setting it to 0 when you don’t want the resource to be included. While this is handy and worth having in your Terraform toolkit it doesn’t cover my use case.

variable "include_rds" {
    default = 0
    description = "Should we include a aws_db_instance? Set to 1 to include it"

resource "aws_db_instance" "default" {
    count = "${var.include_rds}" # this serves as an if

    # ... snip ...

I’m sure these annoyances will be ironed out in time but it’s worth considering them and how they’ll impact the reusability of any modules you’d like to write or third party code you’d want to import. At the moment it’s a hard choice between rewriting everything for my own use and getting all the things I need or vendoring everything in and maintaining a branch with things like my own tagging scheme and required properties.

Testing Terraform projects

Thu, 21 Apr 2016 21:26:00 +0000

While Terraform is remarkably good at its job there are going to be some occasions when you want to test what you wanted actually happened. In the unixdaemon_terraform_experiments repository I’m handling this with awspec and a little custom rspec directory modification.

First we pull in the awspec gem.

bundle install

We also need to add the necessary scaffolding files:

echo "gem 'awspec',  '~> 0.37'" >> Gemfile

mkdir spec

echo "require 'awspec'" >> spec/spec_helper.rb

Now we’ll add a test to our simple-sg project to confirm that the security group was created.

mkdir projects/simple-sg/spec

$ cat > projects/simple-sg/spec/security_group_spec.rb <

Note that the tests live beside the terraform project resources, not in a combined spec directory. This allows us to run only the tests related to the project we’re currently working on. The code to implement this, along with another special case that allows grouping and executing by environment, can be found in the Rakefile spec task. I’ll cover the environment split more in a future post.

We then use rake spec to run tests against our chosen project.

PROJECT_NAME=simple-sg bundle exec rake spec

As we tidied up after ourselves previously this spec run will fail.

PROJECT_NAME=simple-sg bundle exec rake spec

security_group 'test-labs-sg'
  should exist (FAILED - 1)

Finished in 0.03664 seconds (files took 1.67 seconds to load)
1 example, 1 failure

We’ll now recreate the security group and then verify that it exists with the name we gave it.

$ PROJECT_NAME=simple-sg bundle exec rake apply
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

$ PROJECT_NAME=simple-sg bundle exec rake spec

security_group 'test-labs-sg'
  should exist

Finished in 0.00153 seconds (files took 1.36 seconds to load)
1 example, 0 failures

Don’t forget to destroy the security group when you’re done testing.

Something to consider is that you don’t want to duplicate all your terraform work and retest your resource declarations. Instead you should test more dynamic aspects of your configuration. Verifying a templated policy contains the expected strings or that all policies have been attached to a group are much better things to test than just the existence of a resource.

I think awspec is a wonderful little tool and I can see it being useful both when migrating from Ansible to Terraform and to later verify my newer projects.

Announcing the UnixDaemon Terraform experiments repo

Thu, 21 Apr 2016 20:25:00 +0000

Introduction While it’s possible to experiment and learn parts of Terraform in isolation sometimes it’s handy to have a larger, more complete, environment to run your tests in. For me unixdaemon_terraform_experiments this is that repo. It will contain a number of different terraform based projects that can be consistently deployed together. You can see some of my thinking behind this in the Naive first steps with Terraform post. Terraform is a very powerful, but quite young, piece of software so I’m making this repo open to encourage sharing and invite feedback on better way to do things. There is no guarantee that anything in this repo is the best or most current way to do anything. Bootstrap The bootstrap phase requires you to have AWS account credentials. For this repo it’s recommended that you store them in .aws/credentials under distinct profile names and leave [default] empty. We’ll do the initial terraform configuration out of bounds to avoid making bootstrapping difficult. First we create the S3 bucket, which must have a globally unique name, used to store the terraform state files. Then we enable bucket versioning in case of anything going hideously wrong. The AWS_REGION and DEPLOY_ENV variables will help us when we later need to have AWS resources in multiple regions or if you decide to have separate test, staging and production environments for example. export AWS_PROFILE=test-admin export AWS_REGION=eu-west-1 export DEPLOY_ENV=test export TERRAFORM_BUCKET="net.dean-wilson-terraform-state-${AWS_REGION}-${DEPLOY_ENV}" $ aws --region $AWS_REGION s3 mb "s3://${TERRAFORM_BUCKET}" make_bucket: s3://net.dean-wilson-terraform-state-eu-west-1-test/ $ aws --region $AWS_REGION \ s3api put-bucket-versioning \ --bucket ${TERRAFORM_BUCKET} \ --versioning-configuration Status=Enabled You will also need to make a change to the projects Rakefile and tell it your BUCKET_NAME andBUCKET_REGION`. These are (currently, and awkwardly) set as constants at the top of the file and should match the values you exported above. You should now install Terraform. This can be done by downloading the file from the Terraform website, or possibly installing it using your package manager. Once this is done we’ll enable our rake terraform wrapper by installing its dependencies. $ bundle install You can then see the possible rake tasks with $ bundle exec rake -T ... rake plan # Show the terraform plan ... Setting up an environment Before we add our first Terraform project we’ll configure an environment. I’ve decided to structure this repo and code to have three environments, test, staging and production. Each of those will be implemented as a distinct Amazon AWS Account and will have their own S3 distinct bucket for state. If you want to have your own environment names then you’ll need to change ALLOWED_ENVIRONMENTS in the Rakefile. We then create our environment specific variable file. mkdir variables echo 'environment = "test"' > variables/test.tfvars Running an initial terraform project Now we’re past all the basic configuration we’ll create a very simple Terraform project and apply it to confirm everything is working. For our initial project we’ll create a security group and then delete it to show the entire end to end process. Our initial step is to create a directory under projects to hold our new resources. Once this is done we’ll add a single security group resource. mkdir -p projects/simple-sg/resources/ cat > projects/simple-sg/resources/ <

Contaminate AWS instances on ssh login

Sat, 02 Apr 2016 10:10:10 +0000

One of the principles of running large numbers of instances is that consistency is key. Config deviations cause oddities that’ll drain your time with investigations and nothing causes entropy on your hosts like an admin investigating an issue. In this post we’ll configure our instances to mark themselves as contaminated when someone logs in. We can then use other tooling to query, collate and probably reap, machines corrupted by the keystrokes of humans.

While the example here is step-by-step and interactive, you’d normally bake this in to your AMI or deploy it very early in your config stage, possibly using cloud-init. For our test we’ll spin up an instance and grant it an ec2 instance profile so it can alter its own tags.

In terms of moving parts we’ll install the awscli package, add a short script that’ll tag the instance when run and configure PAM to invoke the script when an ssh session opens to the machine.

# install required dependency
sudo apt-get install awscli

cat > /usr/local/bin/add-dirty-tag <

[ "$PAM_TYPE" = "open_session" ] || exit 0

INSTANCE_ID=$(ec2metadata --instance-id)

REGION=$(ec2metadata --availability-zone)
REGION=${REGION%?} # remove the last letter

aws ec2 --region $REGION create-tags --resources $INSTANCE_ID --tags Key=contaminated,Value=true

sudo chmod a+rx /usr/local/bin/add-dirty-tag

Now we have a script to add the ‘contaminated’ tag to our instance we’ll configure PAM to run it when a new ssh session starts. On a Ubuntu system the config should be placed in /etc/pam.d/sshd.

# tag the machine as contaminated if anyone sshs in.
session    optional /usr/local/bin/add-dirty-tag

It’s worth opening another ssh session and logging in to confirm this works. That will leave you with an established connection in case you misconfigure PAM in some way. Once you’ve successfully logged in and caused the new tag to be added to the instance you can run a cli filter from outside the instance to show all hosts that have been interactively connected to:

aws --region eu-west-1 ec2 describe-instances             \
    --filters "Name=tag:contaminated,Values=true"         \
    --query 'Reservations[].Instances[].{id: InstanceId}'
        "id": "i-x134x34x"

If you decide to adopt an approach like this you can expand the values stored in the tag using the values PAM exposes, such as $PAM_USER or $PAM_RUSER and a time stamp. There’s also nothing stopping you from adding something more structured. A concise JSON dict maybe. Just be careful that you don’t overwrite the details on each successive login.

2016 drive cleanup

Sun, 27 Mar 2016 11:39:46 +0000

Over the years I’ve built up a small stack of removable drives, mostly for off site backup rotation, and when one of them (a decade old Maxtor) started to sound like two angle grinders ‘passionately embracing’ I thought it was time to do some data validation and re-planning. Although I’m fully aware that most technology trends towards getting smaller and cheaper it’s been a while since I’ve been drive shopping. My god, the difference a few years makes!


A quick purchase of two host-powered, USB 3, 4TB drives via Amazon Prime and I’ve managed to claw back more physical space in power supply storage alone than the new drives use. I’ve copied everything over to both drives, sent one off to live with relatives, and put the older ones in out of sight storage as a very last ditch option if all other restores fail. It was a slow, painful process, made worse by USB 1 transfer speeds and some bad blocks but most of the data is still safely backed up and my home technology footprint shrinks once more.

Naive first steps with Terraform

Thu, 10 Mar 2016 18:27:00 GMT

Naive First Steps with Terraform On one of the $WORK projects, we’ve recently had a chance to join seemingly the entire AWS using world and spend some time using Terraform to manage a few migration prototypes. I’ve had a few little plays with Terraform over the last few years but I’ve never tried to plan a large environment with it before and even though it’s -very- early days for me it’s been an interesting path of discovery. We initially started out with a directory of tf files that each managed groups of resources and shared a state file. This was fine for getting to grips with basic commands resource writing but once we had a few dozen resources I started to get nervous about every terraform apply. The two things that worried me were firstly that every apply could potentially change every part of the system; even if that part of the code base hadn’t been updated. While this should never really be a problem we’ve seen enough issues that it was still playing on my mind. The second concern was the terraform statefile. Although we were storing it in S3 (and who stores something that important in Consul?) it’s a risk that if any resource was ever written in a corrupted state we’d essentially lose everything in one sweep. As an aside one of my biggest wants for Terraform is a ‘discovery’ mode so we can kill the state file off. The importance of the state file was hammered home when we tried to refactor resources defined in standalone .tf files to be inside modules. This turned out to be a less than fun experience of rewriting JSON using vim and fervently hoping that the plan would eventually look like it did when we started. After we’d come out of our initial experiments with a positive attitude and new found appreciation for Terraforms remarkably comprehensive support of AWS resources it was time to take a second look and see how we’d deal with a much larger, more complicated environment. Our current prototype, built with about 8 days of experience, and I stress that this is an experiment which might have major limitations, has a simple four top level concepts layout. We’ve also, like everyone else, written a wrapper script for gluing all this together and running terraform in a consistent way. Our four basic ideas are that ‘projects’, which we’re treating as isolated groups of resources, should be self contained in both the code they consist of and the state file that represents them. This separation makes it easier to reason about possible changes, and limits the damage radius if something goes wrong. The project directory layout currently looks like this: projects/ # each project has a directory under here projects/change_management/ projects/change_management/ projects/change_management/resources # common resources should be placed here projects/change_management/resources/ projects/change_management/resources/ projects/change_management/resources/production/ # environment specific resources should be placed here projects/change_management/resources/production/ We also have resources that only need to exist in a single environment. Sometimes in addition to other resources that should exist everywhere. We’re implementing that functionality by separating those resources out to a subdirectory. Having a database in all environments but only having read replicas in production and staging, to manage costs, is an example of this. While coming up with this one thing bit us that people familiar with terraform have probably already spotted - terr[...]