Wednesday 9 a.m.–12:20 p.m.

Exploring Machine Learning with Scikit-learn

Jake Vanderplas, Olivier Grisel

Audience level:


This tutorial will offer an introduction to the core concepts of machine learning, and how they can be easily applied in Python using Scikit-learn. We will use the scikit-learn API to introduce and explore the basic categories of machine learning problems, related topics such as feature selection and model validation, and the application of these tools to real-world data sets.


Machine learning is the branch of computer science concerned with the development of algorithms to which can learn from previously-seen data in order to make predictions about future data. It has become an important aspect of work in a variety of applications: from optimization of web searches, to financial forecasts, to studies of the nature of the Universe. This tutorial will provide a hands-on introduction to the central concepts of machine learning and the scikit-learn package. Beginning from the broad categories of *supervised* and *unsupervised* learning problems, we will dive into the fundamental areas of *classification*, *regression*, *clustering*, and *dimensionality reduction*. In each section, we will introduce aspects of the Scikit-learn API and explore practical examples of some of the most popular and useful methods from the machine learning literature. The strength of scikit-learn lies in its clean, uniform, and well-documented interface to efficient implementations of a large number of the most important machine learning algorithms. By the end of this tutorial, participants will have a basic practical background in machine learning and the use of scikit-learn, and will be well poised to apply these tools in many areas, whether for work, for research, for Kaggle-style competitions, or for their own pet projects.

Student Handout

No handouts have been provided yet for this tutorial