We all remember a time when setting up infrastructure meant racking servers, running cables and managing a web of networking by hand. Indeed, this is still very common and quite necessary for many organizations. Thankfully, though, for those of us who are lucky enough to work in a cloud environment, there’s a newer – some might say better – way. At Veracode, we’ve recently had the opportunity to explore and innovate in this space with our next-generation, cloud-native continuous delivery platform.
Cloud computing and infrastructure-as-code
Cue the choir. The software world is so full of pseudo-technical terms and buzzwords that everything sounds like the next best thing. In this case, we are talking about some pretty cool stuff. It started with virtualization. Then came cloud computing. Next we had virtual infrastructure.
Now, the previously unimaginable infrastructure-as-code is at our disposal. This means that we can reduce a rat’s nest of servers, cables and configuration into plain text. We can take that textual definition and execute it through an interpreter capable of spitting out a functional virtual network which meets our exacting specifications.
There are lots of technologies that have cropped up to perform these functions. In a space this complicated, no player is going to be perfect. There will be pros and cons of any technology decision. In the case of our CD infrastructure project, we find ourselves using AWS (Amazon Web Services) for our cloud provider. For our infrastructure management, we’ve chosen Terraform.
Quick aside for context; the components which comprise the CD infrastructure in question include:
- GitLab for SCM and CI orchestration
- An evolving collection of artifact repositories, some third-party, some home-grown, for enabling inter-process handoffs of technology-specific packages
- All the hidden bits that make the pieces work together
It’s not hundreds of interdependent services, and it’s not just one or two. It’s proven to be a healthy level of complexity for exploring these concepts. Now, back to your regularly scheduled blog post.
There’s a key principle of the world that I describe, which may seem counter-intuitive: you want as little “static” infrastructure as possible. You want to be able to describe, and repeatedly reproduce, your infrastructure in as whole a form as possible. In more explicit terms, start with the VPC, and work your way up from there. Why? This offers you the most flexibility and makes your provisioning process isolated from, and resilient to, changes in the surrounding landscape – it makes your infrastructure portable. It also means that you can infer somewhere between most and all of the information needed to configure your applications. Fewer external inputs means less opportunity for human error and, really, isn’t that the point here?
Applications are people too!
What is infrastructure, really? Is it just the network of computing and networking resources I described above? It could be, but it doesn’t have to be. Applications can be included too. In fact, if you consider application deployment to be a first-class citizen in this context, and bundled in with your infrastructure definition, then it enables some pretty fancy stuff.
Let’s unpack that a bit further. From an application perspective, what are the steps required to go from zero to running?
- Install the software
- Configure the software to run in your infrastructure
- Bootstrap runtime data as needed
Well, those are all solved problems. Using a combination of Puppet for (secure) configuration management and Fabric for local and remote task execution, you can easily describe all of the machinations necessary to reach your target state. What’s more, you can supply all the fragile, fiddly networking configurations using outputs from Terraform. Using a method like this, you end up with a self-contained definition for an entire runtime environment, glued together with complementary services. Depending on how far you want to take it, you can even wrap the whole thing in one high-level execution unit: your very own “easy” button!
Code is only as good as the tests that exercise it
As any modern developer knows, you need automated tests to be sure that your code is going to work when it gets deployed. Moreover, many shops use this as a gate in their definition of done. Under such a system, you can’t deliver code without tests to go with it. This seems like a sensible policy. Can we apply it to infrastructure? Of course we can!
Obviously, the units under test are at a different scale than for your typical software application. Most of the same concepts can apply though. Most importantly, post-deployment functional testing is a perfect fit. After your infrastructure is standing, and your applications are configured and running, you can run tests against them using whatever entry points are available to you.
Now, there are numerous frameworks available you can use to build up, organize, and execute testing suites at whatever level of complexity. In the case of functional testing for the new CD infrastructure, we have chosen py.test as our framework. With it, we have full control over the granularity of what we are testing and when. This makes reporting on the tests very easy. It also makes using the tests as a debugging tool a palatable option.
Now we take the leap
Finally, we’ve come to the cool part. Ever heard of “build once, deploy anywhere”? In a nutshell, the ideal in software delivery is to build an artifact, test that artifact, and pass it through promotion environments until it reaches production. It’s clean, auditable, and light on variables. Using the groundwork we’ve laid above, we can apply this concept to delivering infrastructure.
Now, let’s define a few terms. Once you’ve provisioned the infrastructure, deployed and configured the applications, and successfully run the tests, think of that as an artifact. It isn’t exactly “portable,” nor is it contained in any obvious “packaging,” as most artifacts are, but bear with me. Next, how can you get that artifact into production? In this paradigm, “deployment” isn’t a terrific word. Our analogous term is “promotion.” To understand promotion, though, we need to do a little work on “production.”
What is production, really? In the datacenter context, it’s a specific collection of servers where your customers can access your applications. Here we’re going to flip the script. Think of production as the configuration that allows your applications to reach your persistent data. Now, promoting to production simply means applying that configuration to your artifact.
Remember how we said you want your infrastructure definition to be as self-contained as possible? Well, this is the bit that has to be separate. At some point, there is a set of data that has to live from version to version of your infrastructure so that your customers can continue to use it. The trick is to make that set as small as possible. In our case, it consists of:
- S3 buckets
- IAM policies
- KMS encryption keys
- Route53 rules
Even our production database is freshly built on every new environment. This method comes with a whole host of benefits, including:
- With every new environment, all compute resources are created afresh, limiting our exposure to things “going stale”
- With an emphasis on simplicity and repeatability of the promotion process, we can “roll forward” in the face of unforeseen errors
- Our disaster recovery process is the same as our promotion process, so we are constantly testing it and have a high degree of confidence that it will work in the worst case
- Because each iteration of the infrastructure is isolated to its own VPC, one version could look very different from the one that came before it with very few external consequences
With accomplishments like that under our belt, what’s left? Dare I say continuous deployment of mission-critical infrastructure? We’re not there yet, but we’ve set ourselves up for success, for sure.