PyCon 2016 in Portland, Or
hills next to breadcrumb illustration

Sunday 1:20 p.m.–4:40 p.m.

The Fellowship of the Data

Sev Leonard

Audience level:
Best Practices & Patterns


Python is an excellent tool for gathering, organizing, and storing data. This tutorial will cover efficient & scalable methods for performing these activities as part of an automated data pipeline, enabling attendees to build the One Data Pipeline To Rule Them All.


##The Fellowship of the Data Using a variety of Python libraries, object-oriented design principles, and some magic attendees will learn how to build a well-organized data store from a variety of types of web data. Attendees will learn how to pipeline this process & efficiently organize code so changes in data source, shape, or organization can be easily incorporated. For this tutorial sample data sources will be provided. Methods presented in this tutorial are based on the development of the backend data store for [][1], a resource for finding nearby camping opportunities by consolidating multiple data sources of recreation information. This tutorial is in Python 3 ***Libraries referenced in this tutorial***: BeautifulSoup, robobrowser, Pandas, sqlalchemy ***Prerequisite knowledge***: Attendees should have entry-level knowledge of web scraping, querying RESTful APIs, and SQL. Familiarity with Pandas is recommended but not required. If you are interested please consider attending; the tutorial will be notebook-based so you can follow along if some of the material is out of your domain. ***Materials & Laptop requirements***: Attendees will be provided with Jupyter notebooks for the tutorial. Laptops will need to have Python 3.3, jupyter 4.0.6, internet access. The following libraries will need to be installed for the course (script provided ahead of time to help you do this at home before the conference): sqlalchemy 1.0.8 robobrowser 0.5.3 bs4 4.4.0 pandas 0.16.2 requests 2.7.0 [1]:

Student Handout

No handouts have been provided yet for this tutorial