Change the future

Friday 2:35 p.m.–3:05 p.m.

Visualizing Github, Part I: Data to Information

Dana Bauer, Idan Gazit

Audience level:


A treasure trove of data is captured daily by Github. What stories can that data tell us about how we think, work, and interact? How would one go about finding and telling those stories? This two-part talk is a soup-to-nuts tour of practical data visualization with Python and web technologies, covering both the extraction and display of data in illumination of a familiar dataset.


The craft of software development has never been neatly cataloged in one, centralized location. Some projects have long been developed in the open, and some have even exposed their development history, but they were islands: disjoint and not easily connectable.

These connections between multiple developers and multiple projects are the glue that binds us together into larger developer communities—they are our mirror, and for the first time we can take a look at ourselves with the aid of the Github API, our favorite dynamic programming language, and standards-based web technologies.

Github provides the perfect case study in the practice of extracting and presenting meaning from data. Come watch us explain how we tell stories with a familiar dataset: the tools, the techniques, and the thinking behind our anthropological journey into the largest coding metacommunity.

This talk is being presented in two parts. In part I, Dana will cover the first half of the data visualization process: acquiring the data, cleaning it up, and working with it to tease out interesting stories. In part II, Idan covers the presentational aspects: what to display, how best to display it, and interaction. Both talks provide a mix of principles and practical advice, illustrated by the “making of” process behind our attempt to visualize Github.