PyCon Pittsburgh. April 15-23, 2020.

Talk: Analyzing 200 billion GPS Points with Python on the Cheap

Presented by:

Dharhas Pothina, Kim Pevey, Tyler Potts

Description

There are many blog posts out there that teach you how to create a map or analyze your fitbit data using Python, but what do you do when instead of a couple of thousand GPS points you have 200 billion? Explore how you can scale up your analysis in the cloud without having to learn lots of new skills. As a bonus, learn how to visualize datasets of an unusual size (i.e. very big).

We will explore different strategies using open source tools and discuss the pros and cons of each approach. We will also tell you which we picked and why. We will cover Dask and Ibis with both GPUs and CPUs as well as strategies for data storage.

Note: While we specifically focus on geospatial data, the talk will be relevant for anyone familiar with Pandas and needing to scale to larger datasets.