PyCon Pittsburgh. April 15-23, 2020.

Tutorial: Better Machine Learning through Effective Preprocessing with scikit-learn

Presented by:

Kevin Markham

Description

You have some experience building Machine Learning models, but only with artificially clean training data. How do you make the leap to building models using dirty, real-world data?

In this tutorial, you’ll learn how to correctly and efficiently prepare complex datasets for Machine Learning using scikit-learn. You’ll practice handling missing values, text data, categorical data, and data that needs standardization. Most importantly, you’ll learn how to build a coherent, reusable pipeline of preprocessing steps that starts with a pandas DataFrame and ends with a trained scikit-learn model.