This post explains how to achieve rolling deploys on AWS using the HashiCorp stack. It is based on a strategy proposed by Paul Hinze. Normally when I deploy our applications to AWS I first ‘bake’ the code into an AWS AMI using Packer and then feed that image into an auto scaling group. Previously I relied on a lot of custom code/bash scripts etc to do this, but now Terraform makes it much easier.

My example creates a basic one tier architecture inside an AWS VPC. I have based my example on the eu-west-1 region, however it should be fairly straight forward to switch to another one. The resources are spread across all 3 availability zones for high availability.

Before we get started you will need to have the following tools installed:

You should also have the following environment variables set:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

A basic knowledge of Packer and Terraform will also come in handy.

Creating the AWS resources

Start my cloning my example Github repository:

$ git clone https://github.com/robmorgan/terraform-rolling-deploys.git

Next copy your public SSH key to the ssh_keys directory and update the path in key-pairs.tf.

The first step is to bake an Amazon Machine Image or AMI. We do this using Packer and a helper script I’ve created:

$ make bake

Packer should boot an instance on AWS, provision it then create an new image based on it.

Using the AMI ID Packer provided run the Terraform plan step:

$ make plan AMI="ami-XXXYYYZZ"

And once we are confident we can use the apply step to create the AWS resources:

$ make apply AMI="ami-XXXYYYZZ"

Rolling Deploys

Let’s bake a fresh AMI so we can demonstrate a rolling deploy. Run the bake command again:

$ make bake

Packer will output a new AMI ID. Lets use the plan command to ensure Terraform will only create a new auto scaling group:

$ make plan AMI="ami-XXXYYYZZ"

Once we are satisified we can use the apply command again to initiate the rolling deploy.

$ make apply AMI="ami-XXXYYYZZ"

If you are using the AWS console you will notice that Terraform creates a new launch configuration and auto scaling group, but re-uses the existing elastic load balancer. The fresh auto scaling group in turn will boot two instances using the new AMI. Once these instances are passing the ELB health checks, Terraform will remove the old resources. If there is a problem with the new instances Terraform will marked the new ASG as ‘tainted’ and leave the old instances in service.

Here is the code that enables this:

resource "aws_autoscaling_group" "asg_app" {
  lifecycle { create_before_destroy = true }

  # spread the app instances across the availability zones
  availability_zones = ["${split(",", var.availability_zones)}"]

  # interpolate the LC into the ASG name so it always forces an update
  name = "asg-app - ${aws_launch_configuration.lc_app.name}"
  max_size = 5
  min_size = 2
  wait_for_elb_capacity = 2
  desired_capacity = 2
  health_check_grace_period = 300
  health_check_type = "ELB"
  launch_configuration = "${aws_launch_configuration.lc_app.id}"
  load_balancers = ["${aws_elb.elb_app.id}"]
  vpc_zone_identifier = ["${aws_subnet.private_az1.id}", "${aws_subnet.private_az2.id}", "${aws_subnet.private_az3.id}"]
}

resource "aws_launch_configuration" "lc_app" {
    lifecycle { create_before_destroy = true }

    image_id = "${var.ami}"
    instance_type = "c3.large"

    # Our Security group to allow HTTP and SSH access
    security_groups = ["${aws_security_group.default.id}", "${aws_security_group.app.id}"]

    user_data = "${file("user_data/app-server.sh")}"

    lifecycle {
      create_before_destroy = true
    }
}  

As Paul mention, the key parts are:

  • Both the auto scaling group and launch configuration have create_before_destroy set.
  • The launch configuration omits the name attribute which allows Terraform to auto-generate it, preventing collisions.
  • The ASG interpolates the LC name into its name so any changes force a replacement of the ASG.
  • We set the wait_for_elb_capacity attribute of the auto scaling group, so Terraform does not prematurely terminate the current auto scaling group.

These important attributes ensure the elastic load balancer serving traffic to the internet always has at least two healthy instances in service.

Cleaning Up

Finally when you are finished, you can run the destroy command to cleanup the AWS resources used for this example:

$ make destroy

Moving Forwards

My example provides a good starting point for building out your own architecture using the HashiCorp tools. You can tweak the code to add your own application during the bake phase and add other AWS resources such as an RDS database. If you are bored or want to learn you can help me refactor and harden my example - just fork the repository. Also investigate other AWS deployment strategies such as blue/green. If you have any questions, then please get in touch on Twitter: @rjm.

Rob Morgan

CTO based in Berlin, Germany. Creator of #Phinx, Australian, Startups, Technology, Travel.

robmorgan _rjm_


Published