Graph Processing in Python

Audience level:
Big Data
March 9th 10:50 a.m. – 11:30 a.m.


Graphs are everywhere - from your distributed source code control to Twitter analytics. This session presents a set of three problems and shows how they can be decomposed into operations on graphs, and then demonstrates solutions using the various graph libraries available for (or accessible to) Python.


Graphs are a fundamental computer science datatype, and graphs show up in all sorts of models in all sorts of places. So when you have a graph, what can you do with it? Particularly if it is really big?

Thirty minutes isn't a lot of time to discuss graph processing as a topic, so there won't be a lot of discussion relative to graph theory generally or the terminology of graphs. Instead, this is inspired by Raymond Hettinger's "mastering team play" - a series of exercises showing the lowering of a problem into a graph representation, followed by a demonstration of how the problem can be solved through graph processing. There will also be a little bit of compare-and-contrast between the available graph libraries to show differences. Each problem will be given 8-10 minutes.

Problem 1: Python's (legal) history

Python has developed over time under a number of organizations - each with their own license. What portions of Python's codebase are under each license?

  • The CVS/SVN/HG trees as graphs modeling change in time
  • Identifying and labeling node types
  • Graphing and reporting on results

Problem 2: Development Cliques

Linux is famously developed with "lieutenants" in charge of different subsystems of the kernel. Python doesn't have lieutenants... or does it? Put another way, if you have a patch, who should you submit it to?

  • Mailing list connections as a graph
  • Analysis of connections, cliques, and centrality
  • Graphing and reporting on results

Problem 3: Let's get social

Your employer has decided that its website should be turned into a social network - you know, because there aren't enough of those.

  • Bootstrapping a graph by looking at pairwise analysis of products
  • How to suggest who people "might know"?