Introduction to Interactive Predictive Analytics in Python with scikit-learn

Audience level:
Big Data
March 8th 1:20 p.m. – 4:40 p.m.


The goal of this tutorial is to give the attendee a first experience of machine learning tools applied to practical software engineering tasks such as language detection of tweets, topic classification of web pages, sentiment analysis of customer products reviews and facial recognition in pictures from the web or from your own webcam.


The demand for software engineers with Data Analytics and Machine Learning skills is rapidly growing and Python / Numpy is one of the best environments for quickly prototyping scalable data-centric applications or interactively exploring your data especially thanks to tools such as IPython and Matplotlib.

scikit-learn is a very active open source project that implements a variety of state-of-the art machine learning algorithms. The goal of this project and tutorial is to take the algorithms out of the academic papers and make them work on a selection of real world tasks to unleash the value of your data.

We will focus on providing hints to perform the right data preprocessing steps and on how to select algorithms and parameters suitable for the task at hand. We will also introduce tools and methodologies to measure the performance of the trained models as objectively as possible.