Terraform Repository Structure

When IT organizations adopt infrastructure as code (IaC), the benefits in productivity, quality, and ability to function at scale are manifold. However, the first few steps on the journey to full automation and immutable infrastructure bliss can be a major disruption to a more traditional IT operations team’s established ways of working. One of common problems faced in adopting infrastructure as code is how to structure the files within a repository in a consistent, intuitive, and scaleable manner. Even IT operations teams whose members have development skills will still face this anxiety-inducing challenge simply because adopting IaC involves new tools whose conventions differ somewhat from more familiar languages and frameworks.

In this blog post, we’ll go over how we structure our IaC repositories at 2nd Watch with a particular focus on Terraform, an open-source tool by Hashicorp for provisioning infrastructure across multiple cloud providers with a single interface.

First Things First: README.md and .gitignore

The task in any new repository is to create a README file. Many git repositories (especially on Github) have adopted Markdown as a de facto standard format for README files. A good README file will include the following information:

Overview: A brief description of the infrastructure the repo builds. A high-level diagram is often an effective method of expressing this information. 2nd Watch uses LucidChart for general diagrams (exported to PNG or a similar format) and mscgen_js for sequence diagrams.
Pre-requisites: Installation instructions (or links thereto) any software that must be installed before building or changing the code.
Building The Code: What commands to run in order to build the infrastructure and/or run the tests when applicable. 2nd Watch uses Make in order to provide a single tool with a consistent interface to build all codebases, regardless of language or toolset. If using Make in Windows environments, Windows Subsystem for Linux or is recommended for Windows 10 in order to avoid having to write 2 sets of commands in Makefiles: Bash, and PowerShell.

It’s important that you do not neglect this basic documentation for 2 reasons (even if you think you’re the only one who will work on the codebase):

The obvious: Writing this critical information down in an easily viewable place makes it easier for other members of your organization to onboard onto your project and will prevent the need for a panicked knowledge transfer when projects change hands.
The not-so-obvious: The act of writing a description of the design clarifies your intent to yourself and will result in a cleaner design and a more coherent repository.

All repositories should also include a .gitignore file with the appropriate settings for Terraform. GitHub’s default Terraform .gitignore is a decent starting point, but in most cases you will not want to ignore .tfvars files because they often contain environment-specific parameters that allow for greater code reuse as we will see later.

Terraform Roots and Multiple Environments

A Terraform root is the unit of work for a single terraform apply command. We group our infrastructure into multiple terraform roots in order to limit our “blast radius” (the amount of damage a single errant terraform apply can cause).

Repositories with multiple roots should contain a roots/ directory with a subdirectory for each root (e.g. VPC, one per-application) main.tf file as the primary entry point.
Note that the roots/ directory is optional for repositories that only contain a single root, e.g. infrastructure for an application team which includes only a few resources which should be deployed in concert. In this case, modules/ may be placed in the same directory as main.tf.
Roots which are deployed into multiple environments should include an env/ subdirectory at the same level as main.tf. Each environment corresponds to a tfvars file under env/ named after the environment, e.g. staging.tfvars. Each .tfvars file contains parameters appropriate for each environment, e.g. EC2 instance sizes.

Here’s what our roots directory might look like for a sample with a VPC and 2 application stacks, and 3 environments (QA, Staging, and Production):

.
└── roots
    ├── application1
    │   ├── env
    │   │   ├── production.tfvars
    │   │   ├── qa.tfvars
    │   │   └── staging.tfvars
    │   └── main.tf
    ├── application2
    │   ├── env
    │   │   ├── production.tfvars
    │   │   ├── qa.tfvars
    │   │   └── staging.tfvars
    │   └── main.tf
    └── vpc
        ├── env
        │   ├── production.tfvars
        │   ├── qa.tfvars
        │   └── staging.tfvars
        └── main.tf

Terraform modules

Terraform modules are self-contained packages of Terraform configurations that are managed as a group. Modules are used to create reusable components, improve organization, and to treat pieces of infrastructure as a black box. In short, they are the Terraform equivalent of functions or reusable code libraries.

Terraform modules come in 2 flavors:

Internal modules, whose source code is consumed by roots that live in the same repository as the module.
External modules, whose source code is consumed by roots in multiple repositories. The source code for external modules lives in its own repository, separate from any consumers and separate from other modules to ensure we can version the module correctly.

In this post, we’ll only be covering internal modules.

Each internal module should be placed within a subdirectory under modules/.
Module subdirectories/repositories should follow the standard module structure per the Terraform docs.
External modules should always be pinned at a version: a git revision or a version number. This practice allows for reliable and repeatable builds. Failing to pin module versions may cause a module to be updated between builds by breaking the build without any obvious changes in our code. Even worse, failing to pin our module versions might cause an plan to be generated with changes we did not anticipate.

Here’s what our modules directory might look like:

.
└── modules
    ├── ec2
    │   ├── main.tf
    │   ├── outputs.tf
    │   └── variables.tf
    └── vpc
        ├── main.tf
        ├── modules
        │   ├── routing.tf
        │   └── subnets.tf
        ├── outputs.tf
        └── variables.tf

Terraform and Other Tools

Terraform is often used in alongside other automation tools within the same repository. Some frequent collaborators include Ansible for configuration management and Packer for compiling identical machine images across multiple virtualization platforms or cloud providers. When using Terraform in conjunction with other tools within the same repo, 2nd Watch creates a directory per tool from the root of the repo:

sample-repo
├── README.md
├── .gitignore
├── ansible
├── packer
└── terraform
    ├── modules
    └── roots

Putting it all together

The following sample illustrates a sample Terraform repository structure with all of the concepts outlined above:

$ tree sample-repo/terraform

terraform
├── GNUmakefile
├── README.md
├── modules
│   ├── ec2
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   └── variables.tf
│   └── vpc
│       ├── main.tf
│       ├── modules
│       │   ├── routing.tf
│       │   └── subnets.tf
│       ├── outputs.tf
│       └── variables.tf
└── roots
    ├── application1
    │   ├── env
    │   │   ├── production.tfvars
    │   │   ├── qa.tfvars
    │   │   └── staging.tfvars
    │   └── main.tf
    ├── application2
    │   ├── env
    │   │   ├── production.tfvars
    │   │   ├── qa.tfvars
    │   │   └── staging.tfvars
    │   └── main.tf
    └── vpc
        ├── env
        │   ├── production.tfvars
        │   ├── qa.tfvars
        │   └── staging.tfvars
        └── main.tf

Conclusion

There’s no single repository format that’s optimal, but we’ve found that this standard works for the majority of our use cases in our extensive use of Terraform on dozens of projects. That said, if you find a tweak that works better for your organization - go for it! The structure described in this post will give you a solid and battle-tested starting point to keep your Terraform code organized so your team can stay productive!

Additional resources

The Terraform Book by James Turnbull provides an excellent introduction to Terraform all the way through repository structure and collaboration techniques.
The Hashicorp AWS VPC Module is one of the most popular modules in the Terraform Registry and is an excellent example of a well-written Terraform module.
The source code for James Nugent’s Hashidays NYC 2017 talk code is an exemplary Terraform repository. Although it’s based on an older version of Terraform (before providers were broken out from the main Terraform executable), the code structure, formatting, and use of Makefiles is still current.