Please download the following archive with tutorial material and exercises:
In order to follow the tutorial on your laptop you should install the following branch
sklearn-tutorial of my scikit-learn repo.
It contains new stuff, in particular a much improved / simplified API for text processing. Here are some instructions to get you up and runnning.
In order to do so you will need:
numpy and scipy can get tricky to build from source (you will need a fortran compiler). It might be easier to use one of the following options:
On Linux Ubuntu / Debian most of this will be fetched by running:
sudo apt-get build-dep python-scikits-learn
sudo apt-get build-dep python-sklearn
Note: you can do the following in a virtualenv if your prefer (recommended but not necessary).
Fetch the source from my branch using git:
git clone https://github.com/ogrisel/scikit-learn.git cd scikit-learn git fetch origin sklearn-tutorial git checkout -b sklearn-tutorial origin/sklearn-tutorial
Alternatively you can download and unzip the following zip archive.
Under Linux / OSX you should be able to build by running the following command at the top of the source folder:
make inplace pip install -e .
Then run the tests with (you will need nosetests):
All tests should pass. You can ignore the warnings.
You can check that the install went well by launching python and trying:
>>> from sklearn.svm import SVC >>> SVC().fit([, ], [0, 1]) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=1.0, kernel='rbf', probability=False, scale_C=True, shrinking=True, tol=0.001)
Alternatively, if you want to build your-self, please follow these build instructions.
To run the tests, type:
python setup.py build_ext -i nosetests sklearn
If you have issues with the installation, please send me an email: email@example.com with detailed your platform information and any error message you get.
If you really cannot get my dev branch to build then fall-back to the latest stable release.
For windows users there is also an unofficial build for win64 here.
If you get the errors that look like the following when running the tests:
====================================================================== ERROR: Doctest: sklearn.datasets.base.load_sample_image ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python27\lib\site-packages\nose-1.1.2-py2.7.egg\nose\plugins\doctests.py", line 395, in tearDown delattr(builtin_mod, self._result_var) AttributeError: _
you can safely ignore them: this is a a bug in the test runner rather than scikit-learn itself.
You can also ignore the following test failure if running
python -c "import sklearn; sklearn.test()" after having installed the windows package:
AttributeError("'module' object has no attribute 'semi_supervised'",) != None
This is a packaging issue for a new module that won't be used during the tutorial.
scikit-learn uses the numpy array datastructure extensively. If you are not familiar with it, you should have a look at the first chapters of this tutorial. You should also get familiar with the scipy sparse datastructures such as CSR and COO matrices.