Change the future

"Web Scale" Global Server Load Balancing

Alan Wang, Alex Laslavic, Doug Porter

Audience level:
Intermediate
Category:
Other

Description

Want to learn how facebook scales their load balancing infrastructure to support more than a billion users? We will be revealing the technologies and methods we use to route and balance Facebook's traffic. This talk will focus on Facebook's DNS load balancer and software load balancer, and how we use these systems to improve user performance, manage capacity, and increase reliability.

Abstract

Want to learn how Facebook scales their load balancing infrastructure to support more than a billion users? We will be revealing the technologies and methods we use to route and balance Facebook's traffic. This talk will focus on Facebook's DNS load balancer and software load balancer, and how we use these systems to improve user performance, manage capacity, and increase reliability.

Facebook is used by people located all over the world, and its Traffic team is responsible for balancing that traffic and making our network as fast as possible. The Traffic team at Facebook has built several systems for managing and balancing our site traffic, including both a DNS load balancer and a software load balancer capable of handling several protocols.

Our DNS load balancer has two major components: a central GLB decision engine written in Python that makes all the traffic balancing decisions and then generates DNS maps, and an existing open source C DNS server (tinydns) that serves the actual DNS traffic, directing users to clusters based a lookup table loaded from the DNS map.

Our Python decision engine is named Cartographer. It gathers information on internet topology, user latency, user bandwidth, compute cluster load/availability/performance, and then it crunches a bunch of data and determines the current best cluster to point each ISP's users at. Cartographer also receives a continuous stream of updates from its different monitoring channels and automatically pushes new DNS maps to the DNS server whenever it needs to adjust cluster load or react to network problems. (It can react to both a gross interruption of service due to a problem with Facebook's network or clusters, as well as localized outages for users in a given country or who use a given ISP.)

We will talk about the structure of Cartographer and explain some of its core algorithms for programmatically balancing traffic. As it handles traffic routing decisions for more than a billion users on Facebook, it is a great example of a small Python application having large impact.