Thursday 1:20 p.m.–4:40 p.m. in Room 22

Using pandas for Better (and Worse) Data Science

Kevin Markham


The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, proper data science requires careful coding, and pandas will not stop you from creating misleading plots, drawing incorrect conclusions, ignoring relevant data, including misleading data, or executing incorrect calculations. In this tutorial, you'll perform a variety of data science tasks on a handful of real-world datasets using pandas. With each task, you'll learn how to avoid either a pandas pitfall or a data science pitfall. By the end of the tutorial, you'll be more confident that you're using pandas for good rather than evil! Participants should have a working knowledge of pandas and an interest in data science, but are not required to have any experience with the data science workflow. Datasets will be provided by the instructor.

Student Handout

No handouts have been provided yet for this tutorial