pycon logo

PyCon 2011 Atlanta

March 9th–17th

Log in or Sign Up

Creating Complex Data Pipelines in the Cloud: The App Engine Pipeline API

log in to bookmark this presentaton

E
Experienced / Talk
March 11th 10:25 a.m. – 10:55 a.m.
This talk will cover App Engine's new Pipeline API, which connects together complex, time-consuming work (including MapReduces and human actions) in a distributed system. The API transforms Python into an asynchronous language for describing data dependencies in a novel way. We will discuss how the API works, how it achieves parallelism, and how to reuse its design and code outside of App Engine.

Abstract

Notably, the Pipeline API is being used by App Engine to connect the pieces of our MapReduce system. The API's key use-case is executing many MapReduces and offline processes in parallel to form a single data pipeline. This enables developers to very easily "join" data from disparate sources, which is one of the most difficult things to achieve in a distributed processing system. I'll show some specific examples of when you would want to "fan-in" and how to achieve that with the Pipeline API.