Presentation: Pandas From The Ground Up

Wednesday 9 a.m.–12:20 p.m.

Pandas From The Ground Up

Brandon Rhodes

Audience level:: Intermediate
Category:: Python Libraries

Description

The typical Pandas user learns one dataframe method at a time, slowly scraping features together through trial and error until they can solve the task in front of them. In this tutorial you will re-learn how to think about dataframes from the ground up, and discover how to select intelligently from their abilities to solve your data processing problems through direct and deliberately-chosen steps.

Abstract

The novice or occasional user of the Pandas dataframe too often finds themselves lost in a forest of possible next moves. They try calling `groupby()` but the result an opaque object instead of a reorganized dataframe. They attempt a `pivot()` but cannot understand why the result is different from what they expected. The examples for `stack()` and `unstack()` look very nearly like the operations they want to perform, but they can never remember which is which without the documentation. This tutorial will help students rebuild their mental model of the Pandas dataframe from the ground up, starting with the structure of the dataframe and its indexes and then progressing through a complete tour of all of the operations that dataframe methods offer. Symmetries and contrasts will regularly be drawn between the methods and operations to help make it as easy as possible to remember them all, and to help relate the vertical and horizontal indexes along the edges of the dataframe. The tutorial will show Pandas use in both plain Python files and also in the IPython Notebook. Students will be encouraged to use Anaconda or another distribution that gives them both Python and all the standard science and numeric tools up-front without requiring further installation steps. At each stage in the tutorial, students will be given a short lecture and demonstration, allowed to ask questions, then be presented with a series of short dojo-like exercises that build their knowledge of each maneuver by progressing from simple to fairly complex data manipulations. As each feature is learned, students will then be challenged to use it in combination with features learned earlier in the tutorial — a mechanism that should improve retention of the complete set of possible method calls. The exercises will be hand-written and tailored to share three features. First, the examples will try to re-use dataframes whenever possible, so that students quickly become familiar with the data layout in each one of them and can more easily focus on the next maneuver they are learning. Second, the example dataframes will each be rather short — about one or two screens-full of data — so that the user has a more complete picture of “where their data is going” with each operation than would be possible with a larger data set. Finally, each example will be semantically rich and display common-sense relationships among the values present: there will be none of the vast arrays of randomly-generated random numbers that seem so common in scientific Python tutorials, but that give the learner’s mind nothing familiar to grasp as they stare at the output and wonder what relation the output numbers have to the inputs.

Student Handout

No handouts have been provided yet for this tutorial