Those are the setup instructions to prepare the tutorial:
We will use Python 2.7 as support for Python 3 is not yet 100% there... (working on it). Python 2.6 should also mostly work for the tutorial.
We will need the following packages:
Under Windows, the easiest way to install recent binary packages for all of this is probably to get them from Christoph Gohlke's Python Package binary archive.
Be careful downloading the 32 bit versions if you have the 32 bit version of Python or the 64 bit otherwise. We won't need more than 2GB or RAM so both versions should work for the tutorial.
Launch a new IPython notebook session by typing the following in a console (without the
$ ipython notebook
The web browser should open a new window or tab for the IPython user interface: click the "New Notebook" button, then try to import all the modules by typing:
In : import numpy In : import scipy In : import pylab In : import sklearn In : import IPython.parallel In : import psutil
If get any error message, please send me and email at email@example.com with [PyCon 2013 Tutorial] in the object and:
Updated: download the dataset archive: datasets.zip (~100MB)
Updated: download the tutorial material archive from github: parallel_ml_tutorial-master.zip and unzip it.
git clone https://github.com/ogrisel/parallel_ml_tutorial.git
You can then put the
datasets.zip inside the
parallel_ml_tutorial folder and run:
from there so as to unzip the datasets and make the data files ready.
There will also be a set of USB keys with the material available during the tutorial itself but it's faster to download it before the session.
You can also have a look at the README of the parallel_ml_tutorial repo on github.
scikit-learn uses the numpy array datastructure extensively. If you are not familiar with it, you should have a look at the first chapters of this tutorial. You should also get familiar with the scipy sparse datastructures such as CSR and COO matrices.
This tutorial targets people with prior experience will scikit-learn. If you are new to scikit-learn and have not registered for Jake's introductory tutorial at PyCon, it is strongly advised to follow the tutorials from the official documentation or from the SciPy Lecture Notes.