Thursday 1:20 p.m.–4:40 p.m.

How to start web scraping

Jackie Kazil, Sisi Wei

Sometimes data does not come in a format that we would like it in, and we need to other mechanisms to collect data. This tutorial taught, from the perspective of a data journalist and a data scientist, who will give you an overview of use cases of how some folks have used web scraping for data collection, how to get started, where to find data, and what are the ethics behind it.


This talk is part coding, part lessons learned. We will do a hands on walk through of a couple of web scrapers. During the walk through, we will stop periodically to discuss to how the web works and how web pages are constructed. These stopping points will help break down how to get the content that we are looking for. Besides looking at how websites are put together, we will also discuss the ethics of scraping. What is legal? How can you be a friendly scraper, so that the administrator of the website you are scraping won’t try to shut you down? Lastly, we will cover some projects where folks used scraping techniques and the projects that came out of those. We will also share some datasets that could be scraped for inspiration, and how to get started with those examples.

