Select Page

Today, we’ll explore what Continuous Integration means for Infrastructure Code and how to do it.

Avg Reading Time: 5 minutes

In this post, infrastructure code refers to source files containing code, configuration, and other assets used to create and manage resources on platforms like AWS, vSphere, and Docker that provide an API to manage those resources. We will focus on the CI process for a reusable library or module of infrastructure code that is indepdent of any specific infrastructure deployment.

We write code that manages infrastructure using tools like Terraform, CloudFormation, Packer, Docker, and more to manage infrastructure and applications deployed on those platforms. It’s very important that the infrastructure code behaves as intended as the impact of defects typically ranges from “small outage” to “large data loss or exposure.”

Recall that:

The goal of a continuous integration process is to provide rapid feedback on changes to the development team. When a change is introduced into the codebase, the team needs to know whether it is probably good and reasonable to deploy for further testing.

The general CI process for traditional application code is:

  1. merge changes to the main source branch from a short-lived task branch
  2. compile or transpile code to releasable form
  3. run unit tests
  4. package into a releasable artifact
  5. publish artifact to an artifact repository

You can accomplish this with infrastructure code with a bit of translation.

Translating CI to Infra

Let’s clarify and adjust this process for the realities of infrastructure code using an example, the QualiMente tf_s3_bucket Terraform module. This module is a generic that provides an s3 bucket configured with safe defaults and tagging. The CI Process runs on CircleCI:

1. merge changes to the main source branch from a short-lived task branch

No material change. Infra code development is usually a bit slower because the write-test-refactor loop involves creating and updating infrastructure resources that take anywhere usually take from 10 seconds to 10 minutes to create (sadly more sometimes). I think this provides all the more reason to discover problems earlier than at deploy time. More on that later when we’re testing.

2. compile or transpile code to releasable form

Infrastructure code is usually expressed in an uncompiled language like:

  • YAML: CloudFormation
  • JSON: Packer, CloudFormation (maybe don’t)
  • HCL: Terraform
  • Python: Pulumi

In general, this means the code is already in releasable form. However, you can usually add a static code analysis step like terraform validate that checks that the code is sane and complies with some best practices.

3. run unit tests

The testing step is where things usually get confusing. Some people will argue that you can’t ‘unit’ test infrastructure code because to exercise the code you need to instantiate resources using the infrastructure provider’s API and the ‘mocking’ facilities are lacking in the general case. And yes, I agree that we’re not going to be ‘unit testing’ according to some traditional definitions.

But don’t lose sight of our goal to get rapid feedback on whether a change is probably good and reasonable to deploy for further testing.

In the infrastructure code world, the smallest unit we have available to test and distribute is the infrastructure module. So let’s keep that module independent from our deployments so that we can test it in isolation and reuse it via composition within many deployments.

The tf_s3_bucket module uses a the kitchen testing framework and kitchen-terraform extensions to test Terraform modules.

First a test fixture is defined that instantiates the generic infra module in various configurations. Here’s an excerpt from this example:

module "it_minimal" {
  source = "../../../" //minimal integration test

  logical_name = "${var.logical_name}-${random_id.testing_suffix.hex}"
  region       = "${var.region}"

  logging_target_bucket = "${aws_s3_bucket.log_bucket.id}"

  org   = "${var.org}"
  owner = "${var.owner}"
  env   = "${var.env}"
  app   = "${var.app}"

  kms_master_key_id = "${aws_kms_alias.test.target_key_id}"
}

When the tests run, kitchen-terraform will actually instantiate an s3 bucket and upload a file like the fixture says.

Next, kitchen-terraform will run a Ruby Inspec test that verifies the bucket was created as expected:

control 's3' do

  describe "s3 bucket #{actual_s3_id}" do
    subject { s3_bucket(actual_s3_id) }

    it { should exist }
    it { should have_versioning_enabled }
    it { should_not have_mfa_delete_enabled }

    it { should have_logging_enabled(target_prefix: "log/s3/#{actual_s3_id}/") }

    it { should have_tag('Environment').value(expect_env) }
    it { should have_tag('Owner').value(expect_owner) }
    it { should have_tag('Application').value(expect_app) }
    it { should have_tag('ManagedBy').value('Terraform') }

    it { should have_object('an/object/key') }
  end
end

This test verifies that the bucket was created, access will be logged to the correct place, tags are configured, and an object was actually uploaded to it.

Once tests succeed, kitchen-terraform will tear down the test resources.

I’ll be honest, this step isn’t always fun due because it takes longer to provision infrastructure resources and some errors can be a bit inscrutable. However, I can assure you it’s a great feeling as both a change author and reviewer to have a safety net of a +100 tests when refactoring or enhancing a module that manages IAM, a VPC network, database, and other Very Important Infrastructure. Tests build confidence.

4. package into a releasable artifact

Packaging infrastructure code may be a bit different. Though, if you focus on the task of packaging the code into an artifact that can be consumed in a downstream deployment process, the solutions should become clear.

We might tag the source repository the infrastructure module lives in (Terraform) or build a zip file with the code.

5. publish artifact to an artifact repository

Once you’ve packaged (or tagged) your releasable artifact, you can publish it to an artifact repository. Since infrastructure code is usually written in uncompiled languages and imported via a git or http url, this usually means publishing to a central git repository or widely-available http service such as Amazon S3 or an internal artifact repository like Artifactory/Nexus. As always, you must apply appropriate access controls to this code.

Build Times

Yes, the CI process for infrastructure modules should generally take less than 10 (or 15) minutes.

Here are some typical CI process runtimes for QualiMente infrastructure modules:

  • tf_s3_bucket: less than 2 minutes, 25 tests
  • tf_vpc: less than 4 minutes, 102 tests
  • tf_rds_sql_server: 25 minutes, 25 tests; dominated by unavoidable +20min AWS RDS provisioning, but the known-safe, HIPAA compliance is worth it!

Wrap-up

I hope this has helped clarify how continuous integration practices can be applied to the infrastructure as code. As always, I’d love to hear your questions, comments, and challenges you’ve encountered while trying to adopt infrastructure as code.

Stephen

#NoDrama