Change the future

Sunday 1:10 p.m.–1:40 p.m.

Distributed Coordination with Python

Ben Bangert

Audience level:
Intermediate
Category:
Distributed Computing

Description

Processes in a cluster can require controlled access to shared resources, tracking available processes, and sharing state. Unfortunately most tools in this category are oriented around Java. In this talk I cover how to use Python to interact with Apache Zookeeper -- a fault-tolerant consistent data-store -- to write coordinated distributed fault-tolerant applications in Python.

Abstract

This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.

Why Apache Zookeeper?

  • Comparison to other distributed coordination servers (Google Chubby-Lock, Doozer)
  • Fault tolerance of the cluster
  • Consistent atomic operations
  • Performance for read/write operations
  • Ease of building higher level coordination routines (Locks, Party leadership, shared configuration)

The Basics of Zookeeper

  • How Zookeeper operates
  • The data model (Tree of ZNode's, ZNode versioning)
  • Watches (Push based notification of data changes)
  • Ephemeral data and client sessions
  • Consistency guarantees
  • Edge cases to watch out for

Past Python Zookeeper Clients

  • Based on buggy C library
  • Used a buggy Python C binding
  • No champion to maintain it
  • Seg-faulting Python is not cool

A Newly Merged Python Client

  • 8 Python clients built around same time
  • All of them buggy in different ways
  • Unify them all, squash bugs and edge cases!
  • Rewritten to pure Python speaking Zookeepers protocol (PyPy!)
  • Works with gevent 0.13 / 1.0 for async processes

Common Tasks with Python and Zookeeper

  • Distributed locks
  • Configuration management (Flagging clients of a database fail-over)
  • Party membership and Leader Elections
  • Barrier and Double Barriers to enforce process coordination

Handling Failure in a Distributed World

  • Always assume a process will die
  • How Zookeeper handles network partitions
  • How your process should handle Zookeeper when disconnected or is partitioned
  • Dealing with locks and failures in Zookeeper and lock-holders

End

  • Q/A
  • Additional resources