How We Run Infrastructure at Zapier

I think I am long overdue for writing a post about how we run infrastructure at Zapier. While I admit our setup might be far from perfect, I feel it works well and has served us as we scale the business out. I am also proud of how much we have been able to do with such a small team!

Philosophy

Since the beginning, we’ve been big on maintaining a philosophy of “Immutable Infrastructure” and treating our instances as cattle, not pets. Switching from stand-alone EC2 instances to nothing but auto-scaling groups that can scale up and down based on load is a major win. We never modify servers directly, instead, we rebuild server images and rotate our autoscaling groups. This approach limits blast radius provides safety and allows us to run multiple versions of our stack side-by-side in case we need to rollback.

We do violate this a little bit with regard to our deployment process, but I won’t get into it too much here. Instead, let’s focus on the stack which I think is a powerful solution that can be adopted anywhere using completely open-source tools!

The Stack

The stack is pretty straightforward and consists of Packer, Ansible, and Terraform. We use these three tools to provision practically all of our infrastructure on AWS. For those who are unfamiliar, here is what each of these tools does:

Ansible: Ansible is an open-source automation tool that enables you to configure and manage your infrastructure as code. It uses a simple yaml syntax to define the desired state of your infrastructure, and it can be used to automate tasks such as configuration management, application deployment, and more.
Packer: Packer is a tool for creating machine images for multiple platforms from a single source configuration. You can define server images with a simple JSON file and target multiple platforms. We use this for AMIs.
Terraform: Terraform is an infrastructure-as-code tool that enables you to create, manage, and version your infrastructure in a declarative way. It supports multiple cloud providers, including AWS, and enables you to define your infrastructure as code using a simple, easy-to-read syntax called hcl.

Now, let’s look at how these tools can be used together to create a powerful DevOps pipeline:

Use Packer to create machine images: The first step is to use Packer to create machine images for your application. These images should be pre-configured with all the necessary software and dependencies required to run your application.
During the AMI build phase, we run ansible playbooks to install software and do any configuration that remains static. That is, remains completely unchanged when servers are launched.
Use Terraform to provision infrastructure: Once you have the machine images, you can use Terraform to provision the infrastructure required to run your application. This can include creating EC2 auto-scaling groups, configuring security groups, setting up load balancers, and more.
Use Ansible to configure instances: With the infrastructure provisioned, you can use Ansible to configure the instances to ensure they are in the desired state. Ansible can be used to install and configure software, set up users and permissions, and more.
Use Ansible and Terraform to deploy your application: Finally, as new servers come up, we run ansible playbooks for what we call “configure” phase of the process. Sometimes certain config files need to be configured based on the runtime environment. A good example is setting Elasticsearch configurations based on what the memory available or EBS volumes attached to the instance are.

By utilizing AWS, Ansible, Packer, and Terraform in combination, we have created a powerful DevOps pipeline that enables us to provision, configure, and deploy our infrastructure and applications with ease. Iterating on new deployments is quick and easy and allows us to rapidly prototype and release new server configurations