Thursday 9 a.m.–12:20 p.m.

Python for Social Scientists

Renee Chu

Audience level:


Many provocative social questions can be answered with data, and datasets are more available than ever. Start working with it here. First we'll download and visualize one data set from the World Bank Indicators page together, using Matplotlib. Then you'll have time on your own to pick another data set from any online source and plot that. At the end every person/pair will share what they found.


Python for Social Scientists A) Greeting and Orientation - 15 min * Survey of programming levels in the audience. * Survey of educational/research experience. * Confirm installation pre-requisistes: * python 2.7 or higher * MatPlotLib * A text editor * Break into pairs to be partners during the workshop. This to help keep from getting behind due to small questions. Time Total: 0:15 B) Group Work: Import World Bank Data - 45 min * We will write a method to pull in CSV data from the World Bank Indicators page. * We'll be using "Ratio of girls to boys in primary and secondary education" ( * Output data to the console (ie a Python dict) Time Total: 1:00 C) Group Work: Graph the Data, Time Series - 45 min * As a group, we will find and read through pre-existing code to borrow from, specifically this one of a bar series with two sets of data, "Men" and "Women" ( Then we'll alter it to meet our needs and create 2 different charts: * Display "Ratio of girls to boys in primary and secondary education" across years, grouped by country for 2-4 chosen countries * 2 axes: One axis will display ratio of girls to boys in primary education" across countries over time. The other will display another indicator from the World Bank library. * Share the pattern you found with the rest of your table. Time Total: 1:45 D) On Your Own: Answer Your Own Question - 45:00 * With your partner, pick another data set you find interesting. Pose a question that can be answered with that data. * Don't have to limit it to World Bank data set; any that outputs to .csv or .xls can work. Some resources: * SF city data: * DC city data: * Daily Treaasury Statements from 1998 to 2013: (TODO crawl websites and convert .txt to .csv if have time or see if results are in the github for * US Dept of Education: (NOTE many currently unavailable due to shutdown) * US Dept of Defense open data sets (NOTE currently not available due to shurtdown) * Science Hack Day listings: * Using tools we've learned so far, display the data in a way that answers your question. Teacher (and TA's hopefully) will be circling to answer questions Time Total: 2:30 -- 15 min break, may be mixed in earlier with the on-your-own time -- Time Total: 2:45 E) Present to the Group - 15 min * Each pair in the tutorial will present the question they asked and what they found to the group. Time Total: 3:00

Student Handout

No handouts have been provided yet for this tutorial