Monday 1:40 p.m.–2:25 p.m.
Usable Ops: How to make web infrastructure management easier.
- Audience level:
- Best Practices & Patterns
As developer tools increase in power, the systems we’re able to build do too. However, with great power comes great...complexity, and the systems we build today are more complex than ever before. This talk is about reducing the complexity of your web infrastructure, and making it easier for developers on your team to learn, use, and manage your infrastructure.
In the past few decades, web applications have become the backbone of much of the technology industry. Hosting tools like Amazon Web Services allow technology companies to build web applications on an unprecedented scale. While these tools give developers more power, it also allows people to build more and more complicated web applications that require multitudes of developers to build, deploy, monitor, and maintain. This talk focuses on something we don't talk about much in the engineering community: infrastructure usability. Usable infrastructure is something that reduces developer errors, encourages correct behavior, and allows engineers to move quickly without having to worry about managing a complex infrastructure that requires specialized training. **What is usability?** In the 1950s, Elbert Botts invented something to make roads safer. They were raised, reflective dots so that drivers could see the edges of lanes even at night and in low visibility. Today, these Botts’ Dots are a staple in our road infrastructure, making driving far safer than it was before, but we don’t think about them much. We take for granted the idea that we can see where one lane ends and another begins. Roads are a somewhat apt metaphor for web infrastructure; in the last 50 years, the complexity of roads and the number of people driving on them has increased dramatically. Driving needs to be reasonably learnable, and roads have to be safe for the average driver use. When roads are confusing, accident happens. Everyone here can probably think of an intersection, an area where cars merge, or other roads where accidents are common because something is wrong with the usability of those roads. Usability is the ease of use and learnability of a human-made object [Wikipedia - https://en.wikipedia.org/wiki/Usability]. Usability is all about making “the right thing easy and the wrong thing hard.” The idea is this: web infrastructure should be easy to learn and easy to use, which will reduce human error and increase productivity. **Why is usability important for web infrastructure?** Usability is important for everything, but it’s not something we apply to web infrastructure or internal tools very often. As we build increasingly complex web applications, it’s not a good enough system to have one person or one team being the gatekeeper of the application’s infrastructure. This creates an unnecessary bottleneck for workflow issues like deploying and rolling back code, managing system upgrades, changing system installations, and managing issues with servers. When I first started programming, I just assumed that the “mythical devOps team” was how things had to be. However, there is no reason that understanding, using, and building infrastructure automation tools should be so separate from application development. Since most developers understand the basics of how the web works, there’s no reason they shouldn’t be able to easily understand and use the web infrastructure they work on. Often developers might find that even the language used in discussing devops is rather complex. By setting up a more usable infrastructure, it eliminates another layer of complexity that prevents clarity and visibility across teams. **How do you build usable web infrastructure?** The big question is how do you build usable infrastructure. First, we should talk a little bit about Nielsen’s usability heuristics, which are still probably the 10 best recommendations for how to build great software. Finally, here are recommendations for automation tools you can build or use that have great usability. 1. One-click deploy. * If deploying is a big deal, then rolling back is likely a bigger deal. Deploying is something that happens on a (hopefully) continuous basis. However, there are of course going to be issues with deploys, and either bug fixes will need to be deployed as quickly as possible, or the code will need to be rolled back. * This is all better with the click of a button. First, everyone knows how to click a button. Excellent. Second, this allows you to focus on making sure that the engineer knows exactly what is being deployed and any other relevant information. 2. Visibility of system status - where do servers and services live. * Whenever a new engineer starts at a company, there is the inevitable “here’s our system and code architecture talk.” This is usually done on a whiteboard. Automate the processes around interacting with your servers and services, which means that you have a dynamic diagram of all the different parts of your codebase. Link this up to github repositories and engineers no longer have to memorize where code, services, and servers are, they can discover them (yeah, that’s one of Nielsen’s heuristics - “Recognition rather than recall”) Add to this all of that graphs and charts you collect about performance and uptime, and you literally have a system where people can have visibility into the whole system. * With a better foundational understanding of the infrastructure new developers can better contribute to making the system even more usable in several ways: 1) There is less of a hurdle that normally prevents productive conversations about the current system, usually in the form of questions; 2) The new engineers can provide suggestions based on their knowledge and experience with previous infrastructures they have worked with. 3. Changing service installations has the same workflow as programming - Docker * Finally, use tools that fit into the developer workflow. Currently, there’s a strong standard around using git (or mercurial, etc.) for version control. Developers make changes in a local environment, open a pull request when the code is ready, merge it into the main branch after it’s reviewed, and from there they can deploy it out to test and production environments. Basically, the way engineers edit the system is by making code changes and the pushing them somewhere through version control. However, changing system installations is usually a completely separate task in no way related to the code base for the service an engineer is actually working on. Engineers have to go to the person in charge of managing system installations, make sure that they add that to the install list and configure it properly, then test it in a separate workflow. * Enter tools like Docker. Containers are cool for a lot of reasons, but the biggest for me is developer usability. The Dockerfile lives in the root of the code repo and has the configuration details for the container. This means that an engineer can edit the configuration of the service IN THE SAME WAY THEY EDIT CODE. No special workflow. No bothering a random devops person to add the installation. They can then test the container the same way they normally test feature changes to make sure the installations work correctly. Which promotes engineers to be more conscious of deploy conditions as they develop software.