pycon logo

PyCon 2011 Atlanta

March 9th–17th

Log in or Sign Up

Distributed and Cloud computing with Python

log in to bookmark this presentaton

Novice / Tutorial
March 9th 9 a.m. – 12:20 p.m.
This tutorial will teach various ways to distribute python-based computation across a cloud or cluster. Tools covered include Pyro, Sun GridEngine. Google AppEngine, PiCloud, and Hadoop.

Abstract

Detailed description:

Do you have computational problems that take hours, if not days, to solve? You can often distribute your work over a cluster or cloud of computers to solve the problem in only minutes.

This tutorial will teach various ways to distribute python-based computation. Tools covered include Hadoop, Google AppEngine, Sun GridEngine, PiCloud, Hadoop, and Elastic MapReduce.

Attendees should bring a laptop with Python 2.x (x>=6) installed as the tutorial is example based.

Format: Class

Audience:

Intermediate level Python programmers. While no familiarity with distributed computing is assumed, programmers should be very comfortable reading Python code. Familiarity with scientific programming (e.g. numpy, scipy) helps but is not a must.

Class Size: Ideal, 20. Up to 30

Outline:

  • Introduction to distributed computing
    • Types of parallelizable problems
  • Low-level primitives
    • Pyro
  • Job processing on own cluster
    • Oracle (Sun) Grid Engine
  • Cloud Computing Solutions
    • Google AppEngine
    • PiCloud
  • MapReduce for large data
    • Hadoop (using dumbo for python)
    • Elastic MapReduce (overview only)
  • Benchmarks showing how different problems perform on each system
  • Conclusion

Examples used in presentation include:

  • Parallelizing Support Vector Machine training (Python's libsvm wrapper) across a hundred nodes
  • Determining features in brain waves using NumPy and distributed computing
  • Using NumPy, SciPy, and lots of computers for analyzing data from human cells.
  • The classic MapReduce distributed grep.