Python for data lovers: explore it, analyze it, map it

Type:
Talk
Audience level:
Intermediate
Category:
Science
March 10th 2:55 p.m. – 3:40 p.m.

Description

Exploring and analyzing data can be daunting and time-consuming, even for data lovers. Python can make the process fun and exciting. We will present techniques of data analysis, along with python tools that help you explore and map data. Our talk includes examples that show how python libraries such as csvkit, matplotlib, scipy, networkx and pysal can help you dig into and make sense of your data.

Abstract

Learn about powerful python libraries for analyzing all types of data, including spatial data, through the following illustrated examples.

Example 1: Explore data

Problem: I have a large voter data file in CSV format. I want to examine it, check the column headings and data types, and do some basic stats, but I don’t want to pull it into Excel or Access. What are my options?
Solution: csvkit - I can explore my data, chop it up, sort it, summarize it, and prepare it for import to postgis.
Bonus: Developers and journalists have been working hard to add functionality to csvkit. You can contribute!

Example 2: Analyze data

Problem: I have a bunch of data points from Twitter. How do I make sense of what I have in front of me, and where do I start?
Solutions: matplotlib, networkx
Bonus: Learn about how python libraries are plug and play with each other.

Example 3: Map data

Problem: I have a year’s worth of crime incidents for a large city. I want to explore global and local patterns in the data and identify clusters.
Solutions: PySal (Numpy, Scipy)
Bonus: We’ll look at the full ESDA (Exploratory Spatial Data Analysis) module in PySal, and we’ll briefly touch on a selection of the rest of PySal’s functionality.

To wrap up the talk, we'll give some tips on using postgis and geodjango to go from data analysis and mapping to building a web application.