Subscribe: Dean Wilson@UnixDaemon: Whatever affects one directly, affects all indirectly.
http://unixdaemon.net/cgi-bin/blosxom.pl/index.rss
Added By: Feedage Forager Feedage Grade A rated
Language: English
Tags:
cfn nag  code  file  gem  lint  lists  new  puppet lint  puppet  resource  run  set  terraform  testing  time  version 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Dean Wilson@UnixDaemon: Whatever affects one directly, affects all indirectly.

on UnixDaemon: In search of (a) life



My name is Dean Wilson, Unixdaemon.net is my personal site where I store my code, writing, rantings and anything else I feel warrants sharing with the rest of the `Net.



 



Testing multiple Puppet versions with TravicCI (and allowing failures)

Fri, 02 Jun 2017 13:01:25 +0000

When it comes to running automated tests of my public Puppet code TravisCI has long been my favourite solution. It’s essentially a zero infrastructure, second pair of eyes, on all my changes. It also doesn’t have any of my local environment oddities and so provides a more realistic view of how my changes will impact users. I’ve had two Puppet testing scenarios pop up recently that were actually the same technical issue once you start exploring them, running tests against the Puppet version I use and support, and others I’m not so worried about. This use case came up as I have code written for Puppet 3 that I need to start migrating to Puppet 4 (and probably to Puppet 5 soon) and on the other hand I have code on Puppet 4 that I’d like to continue supporting on Puppet 3 until it becomes too much of burden. While I can do the testing locally with overrides, rvm and gemfiles, I wanted the same behaviour on TravisCI. It’s very easy to get started with TravisCI. Once you’ve signed up (probably with github auth) it only requires two quick steps to get going. The first step is to enable your repo on the TravisCI site. You should then add a .travis.yml file to the repo itself. This contains the what and how of building and testing your code. You can see a very minimal example, that just runs rake spec with a specific ruby version, below: --- language: ruby rvm: - 2.1.0 script: "bundle exec rake spec" This provides our basic safety net, but now we want to allow multiple versions of puppet to be specified for testing. First we’ll modify our Gemfile to install a specific version of the puppet gem if an environment variable is passed in via the TravisCI build config. If this is missing we’ll just install the newest and run our tests using that. The lines that implement this, the last five in our sample file, are the important ones to note. To support testing under multiple versions of Puppet we’ll modify our Gemfile to install a specific version of the puppet gem if an environment variable is passed in, otherwise we’ll just install the newest and run our tests using that. The code that implements this, last five lines in our sample, are the important ones to note. #!ruby source 'https://rubygems.org' group :development, :test do gem 'json' gem 'puppetlabs_spec_helper', '~> 1.1.1' gem 'rake', '~> 11.2.0' gem 'rspec', '~> 3.5.0' gem 'rubocop', '~> 0.47.1', require: false end if puppetversion = ENV['PUPPET_GEM_VERSION'] gem 'puppet', puppetversion, :require => false else gem 'puppet', :require => false end Now we’ve added this capability to the Gemfile we’ll modify our .travis.yml file to take advantage of it. Add an env array, with a version from each of the two major versions we want to test under, with the same variable name as we use in our Gemfile. --- language: ruby rvm: - 2.1.0 bundler_args: --without development script: "bundle exec rake spec SPEC_OPTS='--format documentation'" env: - PUPPET_GEM_VERSION="~> 3.8.0" - PUPPET_GEM_VERSION="~> 4.10.0" notifications: email: dean.wilson@gmail.com Now our .travis.yml is getting a little mode complicated you might want to lint it to confirm it’s valid. You can use the online TravisCI linter or install the TravisCI YAML gem and work offline. The example file above will trigger two separate builds when TravisCI receives the trigger from our change. If you want to explicitly test under two versions of Puppet, and fail the tests if anything breaks under either version, you are done. Congratulations! If however you’d like to test against an older, best effort but unsupported version or start testing a newer version that you’re willing to accept failures from, assuming the main other version still passes, while you migrate you’ll need to add another config option to your .tra[...]



Little ruby libraries - Testing with Timecop

Thu, 01 Jun 2017 09:55:17 +0000

When it comes to little known rubygems that help with my testing I’m a massive fan of the relatively unknown Timecop. It’s a well written, highly focused, gem that lets you control and manipulate the date and time returned by a number of ruby methods. In specs where testing requires certainty of ‘now’ it’s become my favoured first stop. The puppet deprecate function is a good example of when I’ve needed this functionality. The spec scenarios should exercise a resource with the time set to before and after the deprecation time in separate tests. The two obvious options are to hard code the dates, which won’t work here as we’re black box testing the function or mocking the calls, something Timecop excels at and saves you writing yourself. require 'timecop' # explicitly set the date. Timecop.freeze(Time.local('2015-01-24')) ... # success: we've explicitly set the date above to be before 2015-01-25 # so this resource hasn't been deprecated should run.with_params('2015-01-25', 'Remove Foo at the end of the contract.') ... # failure: we're using a date older than that set in the freeze above # so we now deprecate the resource should run.with_params('2015-01-20', 'Trigger expiry') ... # reset the time to the real now Timecop.return This allows us to pick an absolute point in time and use literal strings in our tests that relate to the point we’ve picked. No more intermediate variables with manually manipulated date objects to ensure we’re 7 days in the future or 30 days in the past. Removing this boilerplate code itself was a win for me. If you need to ensure all your specs run with the same time set you can call the freeze and return in the before and after methods. before do # all tests will have this as their time Timecop.freeze(Time.local(1990)) end after do # return to normal time after the tests have run Timecop.return end I’ve shown the basic, and for me most commonly used functionality above, but there are a few helper methods that elevate Timecop from “I could quickly write that myself” to “this deserves a place in my gemfile. The ability to freeze time in the future with a simple Timecop.freeze(Date.today + 7) is handy, the auto-return time to normal block syntax is pure user experience refinement but the Timecop.scale function, that lets you define how much time passes for every real second, isn’t something you need every day, but when you do you’ll be very glad you don’t have to write it yourself. [...]



Announcing multi_epp - Puppet function

Wed, 31 May 2017 10:20:25 +0000

As part of refreshing my old puppet modules I’ve started to convert some of my Puppet templates from the older ERB format to the newer, and hopefully safer, Embedded Puppet (EPP).

While it’s been a simple conversion in most cases, I did quickly find myself lacking the ability to select a template based on a hierarchy of facts, which I’ve previously used multitemplate to address. So I wrote a Puppet 4 version of multitemplate that wraps the native EPP function, adds matching lookup logic and then imaginatively called it multi_epp. You can see an example of it in use here:

class ssh::config {

  file { '/etc/ssh/sshd_config':
    ensure  => present,
    mode    => '0600',
    # note the array of files.
    content => multi_epp( [
                            "ssh/${::fqdn}.epp",
                            "ssh/${::domain}.epp",
                            'ssh/default_sshdconfig.epp',
                          ], {
                                'port'          => 22222,
                                'ListenAddress' => '0.0.0.0',
                          }),
  }

}

This was the first function I’ve written using the new, Puppet 4 function API and in general it feels like an improvement to the previous API. The dispatch blocks and related functions encourage you to keep the individual sections of code quite small and isolated but will require some diligence to ensure you don’t duplicate a lot of nearly similar code between signatures. I also couldn’t quite do what I wanted (a repeating set of params followed by one optional) in the API but I’ve worked around that by requiring all the files to check be given as an array; which works but is a little icky. I’ve not gone full “all the shiny” yet and included things like function return values and types but I can see myself converting some of my other functions over to gain the benefit of easier parameter checking and basic types.

So what’s next on the path to EPP? For me it’ll be to get my no ERB template puppet-lint check running cleanly over a few local modules and to double check I don’t slip back in to old habits.




Non-intuitive downtime and possibly not lost sales

Mon, 29 May 2017 11:18:44 +0000

One of the things you’ll often read in web operation books is the idea that while you’re experiencing downtime your customers are fleeing in droves and taking their orders to your competitors out of frustration. However this isn’t always the truism that people take it for.

If your outages are rare, and your site is normally performant and easy to use (or has a monopoly), you’ll find this behaviour a lot less common than you’ve been told. Most people have a small set of sites they are comfortable using and have gradually built up trust and an order history with. This is especially true if you operate in certain niches, such as being the fashion site, or have a very strongly defined brand.

After a period of a few months of short but recurring outages we went back over our traffic logs and ran some queries to see how badly we’d been impacted and help us create our business case for more resources. The results were a little surprising for the more ‘conventional wisdom’ trusting members of the team.

(image)

Instead of seeing a reverse hockey stick graph of our customers deserting us in our hours of need before stabilising at a lower than before constant we saw that while orders did drop off during production outages, as you’d expect from a dead system, as long as recovery times stayed in the range of minutes, and very rarely a small number of hours, we always saw the daily order volume and sales totals bounce back to within a few percentages points of a normal day. In some cases we even saw brief periods of higher than usual levels as everyone finished their pending transactions as soon as we returned.

(image)

After witnessing this we had a few discussions and made some minor changes while waiting for the larger issues to be resolved. For example one aspect to consider is that if you can architect your failures to help users preserve even some of their effort you heavily increase the odds of them finishing. Keeping services like baskets and wishlists active make it increasingly likely they’ll return to complete their transaction with you. Once they’ve gone to the effort of finding their newest ‘must have’ you have a small amount of grace points to spend while you’re getting everything back to normal before they’ll discard their own time investment and move on.

It seems that as an industry we’ve managed to train our users to accept small amounts of failure, especially if your customers favour mobile devices on cellular networks. While i don’t want to try and convince you that downtime has no impact I do think it’s worth going over the numbers after your incidents to see what the slightly longer term impact was and how far away from a normal day your recovery curve gets you.

I should also note that this doesn’t cover security issues. Those have very different knock on effects and are typically orders of magnitude worse.




Smaller Debian Docker tips - apt lists

Fri, 19 May 2017 18:23:47 +0000

One of the hidden gems of GitHub is Jess Frazelle’s Dockerfiles Repo, a collection of Dockerfiles for applications she runs in containers to keep her desktop clean and minimal. While reading the NMap Dockerfile I noticed a little bit of shell I’d not seen before.

I’ve included the file itself below. The line in question is && rm -rf /var/lib/apt/lists/*, a tiny bit of shell that does some additional cleanup once apt has installed the required packages.

FROM debian:stretch
LABEL maintainer "Jessie Frazelle "

RUN apt-get update && apt-get install -y \
	nmap \
	--no-install-recommends \
	&& rm -rf /var/lib/apt/lists/*

ENTRYPOINT [ "nmap" ]

Curiosity got the best of me and I decided to see how much of a saving that line provides. First I built the Docker image as Jess intended:

sudo docker build -t nmap-rm-lists -f Dockerfile-rm-lists .

> sudo docker images
REPOSITORY           TAG      IMAGE ID       CREATED             SIZE
nmap-rm-lists        latest   9a4a697649f9   10 seconds ago      131.1 MB

As you can see in the output this creates an image 131.1 MB in size. If we remove the rm line (and the \ continuation character from the line above) and rebuild the image we should see a larger image.

sudo docker build -t nmap-with-apt-lists -f Dockerfile-with-apt-lists .

...

> sudo docker images
REPOSITORY           TAG      IMAGE ID       CREATED              SIZE
nmap-with-apt-lists  latest   d8459f6f2b93   About a minute ago   146.6 MB

And indeed we do, the image is just over 10% larger without that little optimisation. That’s going to be quite a nice saving over a few dozen container images. While looking through some of the other code in that repo I saw mention of a debian:stretch-slim image so I thought it was worth running an additional experiment with it as the base. Making the small change from FROM debian:stretch to FROM debian:stretch-slim in our Dockerfile, with the rm -rf /var/lib/apt/lists/* command also present, results in a much smaller image at just 86 MB

> sudo docker images
REPOSITORY           TAG      IMAGE ID       CREATED             SIZE
nmap-rm-lists-slim   latest   8fa72fad3929   About a minute ago  86.78 MB

For completeness (Hi Wes!) if we leave the lists in and use the debian:stretch- slim image we have a significantly larger image at 102 MB. This helps show that even with smaller base image the removal of the apt list files is still well worth it.


REPOSITORY             TAG      IMAGE ID      CREATED        SIZE
nmap-with-lists-slim   latest   26e65d974ae6  8 seconds ago  102.2 MB

While an Alpine image would be even smaller it’s nice to see this kind of size saving on Debian based images that look a lot closer to what I’d normally run in my VMs.




Nicer Jenkins Views - Build Monitor Plugin

Sat, 08 Apr 2017 13:25:28 +0000

While migrating and upgrading an old install of Jenkins over to version 2 the topic of adding some new views came up in conversation and the quite shiny Jenkins CI Build Monitor Plugin came up as a pretty, and quick to deploy, option.

Using some canned test jobs we did a manual deploy of the plugin, configured a view on our testing machine, and I have to say it looks as good, and as easily readable from a few desks away, as we’d hoped.

(image)

The next step is to apply the true utility test, leave it in place for a week or so and then remove it and see if anyone notices. If they do we’ll add some puppet scaffolding and roll it out to all the environments.




Tales from the Script

Wed, 01 Mar 2017 19:54:46 GMT

A number of roles ago the operations and developer folk were blessed with a relatively inexperienced quality assurance department that were, to put it kindly, focused on manual exploratory testing. They were, to a person, incapable of writing any kind of automated testing, running third party tools or doing anything in a reproducible way. While we’ve all worked with people lacking in certain skills what makes this story one of my favourites is that none of us knew they couldn’t handle the work.

The manager of the QA team, someone I’d never have credited with the sheer audacity to pull off this long con, always came to our meetings with an earnest face and excuses about the failure of “The Script”. We, being insanely busy modern technical people, took this at face value; how would you run all the regression tests without a script? “There was a problem running the script”, “the newest changes to the script had caused regressions” and similar were always on the tip of their tongue and because the developers were under a lot of time pressure no deep investigations were done. Everyone was assumed to be doing their best and what a great QA manager they were in protecting their people from any fallout from the failures. On it went, all testing was done via “the script” and everything was again good. Or so we assumed.

In one of our recurring nightmare incident reviews, this one after something we’d previously covered had come back for the third time, a few of us began to get suspicious. We decided to build our own little response team and do some digging for the sake of every ones sanity. Now, this was before the days of GitHub and everyone being in one team of sharing and mutual bonding. We knew we’d have to go rooting around other departments infrastructure to see what was going on. Over the course of the next few days the group targeted one of the more helpless QA engineers and began to help him with everything technical he needed. He had the most amazing, fully hand held, on-boarding the department had ever seen and we, in little bits and pieces, began to pierce the veil of secrecy that was the QA teams process.

One day, just before lunch, one of the senior developers involved in our investigation hit the mother-load. The QA engineer had paired with them on adding testing to “the script” for a new feature the developer had written and suddenly he had a full understanding of the script and its architecture.

It was an Excel spreadsheet.

It was a massive, colour coded, interlinked Excel spreadsheet. Each row was a separate test path or page journey. Some rows were 40 fields of references to other rows to form one complete journey. Every time we did a release to staging they’d load up the Excel document from the network share and arrow key their way through row upon row of explicit instructions. Seeing it in use was like watching an insane cross between snake and minesweeper. Some of the cells were links to screen grabs of expected outputs and page layouts. Some of them had a red background to show steps that had recently failed. It was a horrific moment of shared confusion. A team of nearly forty testers had ended up building this monstrosity and running it for months. It was like opening up a word doc and having Cthulhu glare back at you. So we did the only thing we could think of, went to lunch and mostly sat in stunned silence.

And I almost forgot the best part of the story, the Excel spreadsheet? It was named “The_Script.old.xls”




Development Diaries and Today I Learned Repositories

Sun, 26 Feb 2017 15:14:34 GMT

One of the difficulties in technically mentoring juniors you don’t see on a near daily basis is ensuring the right level of challenge and learning. It’s surprisingly easy for someone to get blocked on a project or keep themselves too deep in their comfort zone and essentially halt their progress for extended periods of time. An approach I use to help avoid this stagnation is the keeping of a “Development Diary”. A development diary, which I’ve heard called by many other names, is simple in concept and can be just as easy to implement. It’s the commitment to write down something that you’ve learned in your role each and every day. Over time it becomes a collection of small wins and achievements and shows that even little learnings have a big cumulative impact. While the daily aspect isn’t essential, and I’ve had people on more “Business as usual” focused teams reduce the frequency to as low as once a week, I think that while you’re at a more junior point in your career it should be easier to find new things to take note of than the awkward part in the middle where you’re doing the same thing for the fifth company. One of the best diaries I’ve had the pleasure of reading was by a non-native English speaker and nested amongst the usual technical content was the occasional gem, an explanation of an idiom they’d heard and tried to work out before looking it up. The best technical implementation of a development diary I’ve seen recently was a simple git repo of directories. A single post, written in markdown, was added each day. It served as a great little git learning project for the junior (pull request reviews, branches and merging in to master) and provided an excellent single location to both look back over and see how much they’d learned over the last five months while providing them with starting material for their more formal quarterly reviews. Watching this diary grow as a mentor provides some helpful insights in to how the person is doing. You can see what they consider interesting, if their focus is narrow or ranges over entire parts of the work and, if nothing is added for longer periods of time, can provide an early warning. I’ve found that not having anything to add for longer periods of time is a red flag and highlights that something is happening that requires more attention. It could be the person’s either blocked at work, has something else on their mind or isn’t being challenged. None of which are ideal for anything beyond short periods of time. As an aside while I’ve had the most success using this in a one-to-one basis with juniors I’ve tried it a few times across an entire team when doing a discovery phase for a larger piece of work. I’ve found it a lot harder to achieve consistent buy-in in this environment. Especially when external contractors and companies are involved as it requires a fair amount of trust and honesty between all involved for it to be useful. When used in this way the willing involvement of the more senior technical people and how the juniors consider them seems to be the best indication of whether it’ll have a positive impact or not. In a more nurturing and sharing team the seniors are willing to show what they didn’t know without risk of losing credibility and the juniors are open to show their progress without being judged harshly. In a less functional team the juniors are very hesitant to risk embarrassing themselves and the seniors hate the idea of showing weakness or gaps in their skills. Something I’ve had a lot less success with is nearly the reverse of a “Today I Learned”. A place to collect the things you didn’t understand while working. I find it hard to determine how much material to record in this form and wa[...]



Terraform Version Restrictions

Sat, 26 Nov 2016 19:42:10 GMT

One of my favourite forthcoming Terraform 0.8 features is the ability to restrict the versions of terraform a configuration file can be run by. Terraform is a rapidly moving project that constantly introduces new functionality and providers and unless you’re careful and read the change logs, and ensure everyone is running the same minor version (or you run terraform from a central point like Jenkins), you can easily find yourself getting large screens of errors from using a resource that’s in terraform master but not the version you’re running locally.

The new terraform configuration block allows you to avoid these kinds of issues by explicitly declaring which versions your code requires -

    $ cat resources/my_resources.tf

    terraform {
        required_version = "> 0.8.3"
        # or specify a lower and upper bound
        # required_version = "> 0.7.0, < 0.8.0"
    }

    $ ./terraform-8 plan resources

    The currently running version of Terraform doesn't meet the
    version requirements explicitly specified by the configuration.
    Please use the required version or update the configuration.
    Note that version requirements are usually set for a reason, so
    we recommend verifying with whoever set the version requirements
    prior to making any manual changes.

    Module: root
    Required version: > 0.8.3
    Current version: 0.8.0

While it’s not a shiny new piece of functionality I think this change will be greatly welcomed by terraform module authors. It shows a further step in maturity of both the tool and its emerging ecosystem and I can’t wait for it to become widely adopted.




Removing 'magic numbers' and times from your Puppet manifests

Thu, 06 Oct 2016 11:17:00 +0000

In a large Puppet code base you’ll eventually end up with a scattering of time based ‘magic numbers‘ such as cache expiry numbers, zone file ttls and recurring job schedules. You’ll typically find these dealt with in one of a few ways. The easiest is to ignore it and leave a hopefully guessable literal value (such as 3600). The other path often taken is the dreaded heavily linked and often missed comments that start off as 86400 # seconds in a day and over time become 3600 # seconds in a day.

The time_units puppet function is a proof of concept, written for a code base that suffers heavily from this, that makes these kinds of numbers more explicit and self-documenting. Once you’ve installed the module from the puppet-forge:

puppet module install deanwilson-time_units

and restarted your puppetmaster you can use it in your code with calls like these:

time_units(15, 'minutes', 'seconds') # returns 900

You can also make calls using a slightly ‘prettier’ 4 argument version.

time_units(2, 'hours', 'in', 'minutes') # returns 120

Is the added complexity of a function worth it when peer code review can ensure your code and comments are changed together? It all depends on what your manifests look like, how you review changes and how heavily you use hiera to bring values in. Hopefully you’ll never need this, but in the cases where you’re working on legacy spaghetti it’s sometimes nice to have little wins and cleanups.




Puppet Lint Plugins - 2.0 Upgrade and new repo

Sun, 21 Aug 2016 17:47:00 +0000

After the recent puppet-lint 2.0 release and the success of our puppet-lint 2.0 upgrade at work it felt like the right moment to claw some time back and update my own (11!) puppet-lint plugins to allow them to run on either puppet-lint 1 or 2. I’ve now completed this and pushed new versions of the gems to rubygems so if you’ve been waiting for version 2 compatible gems please feel free to test away.

Now I’ve realised exactly how many plugins I’ve ended up with I’ve created a new GitHub repo, unixdaemon-puppet-lint-plugins, that will serve as a nicer discovery point to all of my plugins and a basic introduction to what they do. It’s quite bare bones at the moment but it does present a nicer approach than clicking around my github profile looking for matching repo names.




Puppet Lint 2.0 Upgrade

Thu, 04 Aug 2016 11:47:00 +0000

With the recent puppet-lint 2.0 release it seemed a good time to bump the version we use at $WORK and see what’d changed. In theory it was as simple as changing the version in our Gemfile and ideally everything should continue as normal, but in practise it was a little more work than that and in this post I’m going to explain what we found. Firstly let’s cover a lovely, free, bonus. On our test codebase puppet-lint 1.0.1 took about 25 seconds to run on average. Immediately after the upgrade to 2.0.0 our run times dropped to around 19 seconds, with no changes required to our code. While it might seem like a tiny amount of time, considering how often the tests get run, we’ve probably already recouped the time spent performing the upgrade. In terms of lint warnings the first, and easiest to fix, complaint was the change from --no-80chars-check to --no-140chars-check. While most places already disable the line length check the Puppet style guide has become a little more accepting recently and now allows lines up to 140 characters. We have some longer lines, such as embedded ssh public keys, that hit this limit so we migrated from disabling the 80 character check to disabling the 140 one. This also required us to move from using the old config file .puppet-lintrc to the newer one .puppet-lint.rc. That was a few minutes of work so shouldn’t be a blocker for anyone. The second source of lint warnings from the upgrade came as a bit of a surprise. It seems that fat arrow (=>) alignment wasn’t being correctly checked on attributes that ended with a semi-colon. We had code that looked like this: file { '/tmp/fake': content => 'Hello', mode => '0644'; } That ran fine under puppet-lint 1.0.1 but raised issues under 2.0.0. Fixing it was easy enough file { '/tmp/fake': content => 'Hello', mode => '0644'; } and then the awkward, unneeded semi-colon was replaced with a sensible comma to make future diffs bit nicer too. file { '/tmp/fake': content => 'Hello', mode => '0644', } We fixed up nearly all the code violations with no manual intervention. We always work on branches so it safe enough to run bundle exec puppet- lint --fix . over the entire code base, let it change what it wanted, and then read through the diffs. While this completed 99% of the fixes it did raise one interesting --fix edge case / bug that I have on the TODO list to investigate: - warning => "@${minimum_request_rate * 1.2}", + warning => "@${minimum_request_rate} * 1.2", The fix code is a little aggressive in its protecting of variable names and in this case changes the functionality significantly by replacing a multiplication with a variable and a literal string. There is something to be said that this’d be better as an inline_template but for now changing it back was simple enough and the checks are happy with it. In closing, technically the upgrade is worth doing for the performance improvements and stricter linting. From a community side it’s nice to see more people involved and have versioned releases coming out rather than pinning to individual git hashes. A big ‘thank you’ is deserved by all the people involved. If you want to see exactly what was done you can see the full puppet-lint 2.0 upgrade in our public puppet code. [...]



Specialising validate_re with wrapper functions in Puppet

Tue, 28 Jun 2016 20:10:00 +0000

Once your puppet code base reaches a certain size you’ll often have a number of validate_ functions testing parameters and configuration values for compliance with local rules and requirements. These invocations often look like this: validate_re($private_gpg_key_fingerprint, '^[[:alnum:]]{40}$', 'Must supply full GPG fingerprint') Once you’ve spent a minute or two reading that you’ll probably be able to understand it; but wouldn’t it be nice to not have to care about the exact details and focus on what you’re actually testing? An approach I’ve been experimenting with on one larger code base is to specialise those kind of calls using a wrapper function. Instead of the detailed call above we can now do validate_gpg_key_fingerprint($finger_print). Which of those are easier to read? Which is simpler for people new to puppet? Implementing this kind of wrapper is much easier than you’d expect. All you need is a small stub function that wraps an existing one, in our case validate_re, and supplies some sensible defaults and local validation. You can see an example below. cat modules/wrapper_functions/lib/puppet/parser/functions/validate_gpg_key_fingerprint.rb module Puppet::Parser::Functions newfunction(:validate_gpg_key_fingerprint, :doc => <<-'ENDHEREDOC') do |args| A simple wrapper function to show specialisation of 'validate_re' from the stdlib and how it can make manifests more domain specific and easier to read. ENDHEREDOC unless (args.length >= 1 && args.length <= 2) message = 'wrong arguments - [error message]' raise ArgumentError, "validate_gpg_key_fingerprint(): #{message}" end fingerprint = args[0] # here we set the local rules and sensible defaults message = args[1] || 'Must supply full GPG fingerprint' regex = '^[[:alnum:]]{40}$' # here we run the original function function_validate_re( [ fingerprint, regex, message ] ) end end This is easy to test in isolation, a nice place to encompass more complicated validations while presenting a simple usage case and requires only a small amount of shim coding. If this approach interests you it’s also quite easy to achieve a similar benefit with custom wrappers to the is_ functions in stdlib that have a greater understanding of what you’re testing than the basic, but common cases, provided in things like stdlib . For example you can wrap is_ip_address and only return true for addresses in your own valid ranges. The most obvious downside to this approach, other than the custom coding required, is that if used often enough it’s easy to convince people that puppet has a number of basic validate_ and is_ functions that don’t actually exist anywhere outside of your repos. Although these kinds of changes are not going to revolutionise your code base they are nice, gradual improvements. If you have a number of users that don’t work with puppet on a regular basis, changing the vocabulary of your functions to closer align with your local domain terms can be a nicer way to ease them into reading, and eventually contributing, to the code. [...]



CloudFormation Linting with cfn-nag

Mon, 02 May 2016 17:46:00 +0000

Over the last 3 years I’ve done a lot of CloudFormation work and while it’s an easy enough technology to get to grips with the mass of JSON can become a bit of a blur when you’re doing code reviews. It’s always nice to get a second pair of eyes, especially an unflagging, automated set, that has insight in to some of the easily overlooked security issues you can accidentally add to your templates. cfn-nag is a ruby gem that attempts to sift through your code and present guidelines on a number of frequently misused, and omitted, resource properties. gem install cfn-nag Once the gem and its dependencies finish installing you can list all the rules it currently validates against. $ cfn_nag_rules ... IAM policy should not apply directly to users. Should be on group ... I found reading through the rules to be quite a nice context refresher. While there are a few I don’t agree with there are also some I wouldn’t have thought to single out in code review so it’s well worth having a read through the possible anti-patterns. Let’s check our code with cfn-nag. cfn_nag --input-json-path . # all .json files in the directory cfn_nag --input-json-path templates/buckets.json # single file check The default output from these runs looks like: ./templates/buckets.json ------------------------------------------------------------ | WARN | | Resources: ["AssetsBucketPolicy"] | | It appears that the S3 Bucket Policy allows s3:PutObject without server-side encryption Failures count: 0 Warnings count: 1 ./templates/elb.json ------------- | WARN | | Resources: ["ELB"] | | Elastic Load Balancer should have access logging configured Failures count: 0 Warnings count: 1 If you’d like to reprocess the issues in another part of your tooling / pipelining then the json output formatter might be more helpful. cfn_nag --input-json-path . --output-format json { "type": "WARN", "message": "Elastic Load Balancer should have access logging configured", "logical_resource_ids": [ "ELB" ], "violating_code": null } While the provided rules are useful it’s always a good idea to have an understanding of how easy a linting tool makes adding your own checks. In the case of cfn-nag there are two typed of rules. Some use JSON and jq and the others are pure ruby code. Let’s add a simple pure ruby rule to ensure all our security groups have descriptions. At the moment this requires you to drop code directly in to the gems contents but I imagine this will be fixed in the future. First we’ll create our own rule: # first we find where the gem installs its custom rules $ gem contents cfn-nag | grep custom_rules ./.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/cfn-nag-0.0.19/lib/custom_rules Then we’ll add a new rule to that directory touch $full_path/lib/custom_rules/security_group_missing_description.rb Our custom check looks like this - class SecurityGroupMissingDescription def rule_text 'Security group does not have a description' end def audit(cfn_model) logical_resource_ids = [] cfn_model.security_groups.each do |security_group| unless security_group.group_description logical_resource_ids << security_group.logical_resource_id end end if logical_resource_ids.size > 0 Violation.new(type: Violation::FAILING_VIOLATION, message: rule_text, logical_resource_ids: logical_resource_ids) else nil end end end The code above was heavily ‘borrowed[...]



Terraform Modules - My Sharing Wishlist

Sun, 01 May 2016 17:15:40 +0000

I’ve been writing a few Terraform modules recently with the aim of sharing them among a few different teams and there are a couple of things missing that I think would make reusable modules much more powerful. The first and more generic issue is using the inability to use more complex data structures. After you’ve spent a while using Terraform with AWS resources you’ll develop the urge to just create a hash of tags and use it nearly everywhere. Hopefully with the ability to override a key / value or two when actually using the hash. If your teams are using tags, and you really should be, it’s very hard to write a reusable module if the tag names in use by each team are not identical. Because you can only (currently) pass strings around, and you’re unable to use a variable as a tag name, you’re stuck with requiring everyone to use exactly the same tag names or not providing any at all. There’s no middle ground available. tags { "${var.foo}" = "Baz" } # creates a Tag called literally '${var.foo}' My second current pain point, and the one I’m more likely to have missed a solution to, is the ability to conditionally add or remove resource attributes. The most recent time this has bitten me is when trying to generalise a module that uses Elastic Load Balancers. Sometimes you’ll want an ELB with a cert and sometimes you won’t. Using the current module system there’s no way to handle this case. If I was to do the same kind of thing in CloudFormation I’d use the AWS::NoValue pseudo parameter. "DBSnapshotIdentifier" : { "Fn::If" : [ "UseDBSnapshot", {"Ref" : "DBSnapshotName"}, {"Ref" : "AWS::NoValue"} ] } If DBSnapshotName has a value the DBSnapshotIdentifier property is present and set to that value. If it’s not defined then the property is not set on the resource. As an aside, after chatting with @andrewoutloud, it’s probably worth noting that you can make entire resources optional using a count and setting it to 0 when you don’t want the resource to be included. While this is handy and worth having in your Terraform toolkit it doesn’t cover my use case. variable "include_rds" { default = 0 description = "Should we include a aws_db_instance? Set to 1 to include it" } resource "aws_db_instance" "default" { count = "${var.include_rds}" # this serves as an if # ... snip ... } I’m sure these annoyances will be ironed out in time but it’s worth considering them and how they’ll impact the reusability of any modules you’d like to write or third party code you’d want to import. At the moment it’s a hard choice between rewriting everything for my own use and getting all the things I need or vendoring everything in and maintaining a branch with things like my own tagging scheme and required properties. [...]