CI/CD for Infrastructure as Code with Terraform and Atlantis

In this post, we’ll go over a complete workflow for continuous integration (CI) and continuous delivery (CD) for infrastructure as code (IaC) with just 2 tools: Terraform, and Atlantis.

This post originally appeared in the 2nd Watch Company Blog.

What is Terraform?

So what is Terraform? According to the Terraform website:

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

In practice, this means that Terraform allows us to declare what we want our infrastructure to look like (in any cloud provider), and will automatically determine the changes necessary to make it so. Because of its simple syntax and cross-cloud compatibility, it’s 2nd Watch’s choice for infrastructure as code.

Pain You May Be Experiencing Working With Terraform

Once you have multiple collaborators (individuals, teams, etc.) working on a Terraform codebase, some common problems are likely to emerge:

  1. Enforcing peer review becomes difficult. In any codebase, we’ll want to ensure that our code is peer reviewed in order to ensure better quality in accordance with The Second Way of DevOps: Feedback. The role of peer review in infrastructure as code codebases is even more important. Infrastructure as code is a powerful tool, but that tool has a double-edge: we are clearly more productive for using it, but that increased productivity also means that a simple typo could potentially cause a major issue with our production infrastructure.

    In order to minimize the potential for bad code to be deployed, we want to require peer review on all proposed changes to a codebase (e.g. GitHub Pull Requests with at least one reviewer required). Terraform’s open source offering has no facility to enforce this rule.

  2. Terraform plan output is not easily integrated in code reviews. In all code reviews, we must examine the source code to ensure that our standards are followed, that the code is readable, that it’s reasonably optimized, etc. In this aspect, reviewing Terraform code is like reviewing any other code. However, Terraform code has the unique requirement that we must also examine the effect the code change will have upon our infrastructure (i.e. we must also review the output of a terraform plan command).

    When we potentially have multiple feature branches in the review process, it becomes critical that we are assured that the terraform plan output is what will be executed when we run terraform apply. If the state of infrastructure changes between a run of terraform plan and a run of terraform apply, the effect of this difference in state could range from inconvenient (the apply fails) to catastrophic (a significant production outage). Terraform itself offers locking capabilities, but does not provide an easy way to integrate locking into a peer review process in its open source product.

  3. Too many sets of privileged credentials. Highly privileged credentials are often required to perform Terraform actions, and the greater the number principals you have with privileged access, the higher your attack surface area becomes. Therefore, from a security standpoint, we’d like to have less sets of admin credentials which can potentially be compromised.

What is Atlantis?

And what is Atlantis? Atlantis is an open source tool that allows safe collaboration on Terraform projects by making sure that proposed changes are reviewed and that the proposed change is the actual change which will be executed on our infrastructure. Atlantis is compatible (at the time of writing) with GitHub and Gitlab, so if you’re not using either of these Git hosting systems, you won’t be able to use Atlantis.

How Atlantis Works With Terraform

Atlantis is deployed as a single binary executable with no system-wide dependencies. An operator adds a GitHub or GitLab token for a repository containing Terraform code. The Atlantis installation process then adds hooks to the repository which allows communication to the Atlantis server during the Pull Request process.

You can run Atlantis in a container or a small virtual machine - the only requirement is that the Terraform instance can communicate with both your version control (e.g. GitHub) and infrastructure you’re changing (e.g. AWS). Once Atlantis is configured for a repository, the typical workflow is:

  1. A developer creates a feature branch in git, makes some changes, and creates a Pull Request (GitHub) or Merge Request (GitLab).
  2. The developer enters atlantis plan in a PR comment.
  3. Via the installed web hooks, Atlantis locally runs terraform plan. If there are no other Pull Requests in progress, Atlantis adds the resulting plan as a comment to the Merge Request.
    • If there are other Pull Requests in progress, the command fails because we can’t ensure that the plan will be valid once applied).
  4. The developer ensures the plan looks good and add reviewers to the Merge Request.
  5. Once the PR has been approved, the developer enters atlantis apply in a PR comment. This will trigger Atlantis to run terraform apply and the changes will be deployed to your infrastructure.
    • The command will fail if the Merge Request has not been approved.

The following sequence diagram illustrates the sequence of actions described above:

Atlantis sequence diagram

We can see how our pain points in Terraform collaboration are addressed by Atlantis:

  1. In order to enforce code review, we can launch Atlantis with the --require approvals flag: https://github.com/runatlantis/atlantis#approvals
  2. In order to ensure that our terraform plan accurately reflects the change to our infrastructure that will be made when we run terraform apply, Atlantis performs locking on a project or workspace basis: https://github.com/runatlantis/atlantis#locking
  3. In order to prevent creating multiple sets of privileged credentials, we can (e.g. in AWS) deploy Atlantis to run on an EC2 instance with a privileged IAM role in its instance profile. In this way, all of our Terraform commands run through a single set of privileged credentials and obviate the need to distribute multiple sets of privileged credentials: https://github.com/runatlantis/atlantis#aws-credentials

Conclusion

So we can see that with minimal additional infrastructure and we can establish a safe and reliable CI/CD pipeline for our infrastructure as code, enabling us to get more done safely!

Contents

comments powered by Disqus