PyCon 2016 in Portland, Or
hills next to breadcrumb illustration

Tuesday 2:35 p.m.–3:05 p.m.

IPython Notebook in Data Intensive Communities: Accelerating the process of Discovery

Frances Haugen, Patrick Phelps

Audience level:
Novice
Category:
Best Practices & Patterns

Description

How does IPython Notebook (also known as Jupyter Notebook) change how data intensive teams work? This talk focuses on how the Search and Ads teams at Yelp adopted IPython notebook and how it changed how analysis is undertaken and communicated. Tradeoffs between ease-of-use and reusable code production will also be discussed along with security implications of adoption in an enterprise context.

Abstract

IPython Notebook (also known as Jupyter Notebook) is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media. This talk is co-presented by a Data Product Manager from Yelp Search (Frances Haugen) and a lead Data Scientist from Yelp Ads (Patrick Phelps) who present two different perspectives on how IPython spread among our teams, what factors drove the adoption, and what role IPython plays in how we and our teammates do our jobs. Much of what Data Scientists and Data Engineers do is often hidden by the tools we use. The end result may be presented as a beautiful visualization, but the raw nuts and bolts of what was necessary to produce the effortless outcome is almost always offstage. For software engineers who work on data intensive products or projects, their day-to-day responsibilities may often bear a strong resemblance to researchers or data scientists, teasing out patterns from messy flows of information. One powerful tool that has significantly more adoption in academia than in industry is IPython Notebook. IPython Notebook provides an opportunity to combine the two faces of data science to make routine (and novel) analysis faster and more accurate. Placing sequential steps of analysis in the same context as the output, helps team members to check each other’s work and boosts collaboration. There are many organizational parallels between the challenges of running an academic lab and a team of data-intensive software engineers. Mixed ability groups with diverse backgrounds must be assumed to be the norm, giving rise to challenges like how to spread best practices in a scalable way, how to confirm experimental or analytical results, or even the best ways to discuss preliminary findings. IPython notebook was originally developed to address many of these challenges with an eye towards maximizing the ability of reproducing findings - a goal any team of data focused people shares! Yelp has slowly adopted IPython Notebook as one of our foundational tools across our Data Mining teams in both Search and Ads. This talk focuses on how this came to be, what we’ve gained (and lost) as a result of this adoption, and some ways you might want to use IPython on your own teams. We will also discuss the tradeoffs between ease of use and the production of reusable code. Security implications and potential organizational objections will also be covered.