Change the future

Thursday 9 a.m.–12:20 p.m.

An Introduction to scikit-learn: Machine Learning in Python

Jake Vanderplas

Audience level:


This tutorial will offer an introduction to the scikit-learn package and to the central concepts of Machine Learning. We will introduce the basic categories of learning problems, and explore practical examples based on real-world data, from handwriting analysis to facial recognition to automated classification of astronomical images.


Machine Learning is a discipline involving algorithms designed to find patterns in and make predictions about data. It is nearly ubiquitous in our world today, and used in everything from web searches to financial forecasts to studies of the nature of the Universe.

This tutorial will provide a hands-on introduction to the basic concepts of machine learning and the use of scikit-learn to perform learning tasks. Scikit-learn is an actively developing python package containing implementations of many of the most popular and powerful machine learning methods used today. Scikit-learn offers a consistent interface to these methods, and this tutorial will focus on familiarizing students with that interface so that they can continue to explore these tools and concepts on their own. The examples and exercises will draw heavily from the fields of Astronomy and Astrophysics, where machine learning concepts have been applied to greatly increase our understanding of the Universe.

The tutorial will consist of a mix of lectures, interactive examples, and hands-on exercises. By the end, participants will be poised to apply a wide variety of machine learning tools from scikit-learn to a broad range of applications.

To get the most out of this tutorial, participants should have some familiarity with manipulating arrays using numpy and visualizing data using matplotlib. Much of the material will be presented in the form of IPython notebooks, and familiarity with this interface will be beneficial. Participants should plan to bring their laptop and to have installed Python 2.6-2.7, numpy 1.6+, scipy 0.9+, matplotlib 1.0+, scikit-learn 0.11+, and IPython 0.12+. The use of the IPython notebook is important for the interactive exercises: if it is correctly installed, running 'ipython notebook' on the command-line should result in the web browser opening to the IPython dashboard. For troubleshooting information see

Update: See updated tutorial preparation instructions at An Introduction to scikit-learn: Machine Learning in Python