Wednesday 1:20 p.m.–4:40 p.m.

Data Wrangling for Kaggle Data Science Competitions – An etude

Krishna Sankar

Audience level:
Best Practices & Patterns


Let us mix Python analytics tools, add a dash of Machine Learning Algorithmics & work on Data Science Analytics competitions hosted by Kaggle. This tutorial introduces the intersection of Data, Inference & Machine Learning, structured in a progressive mode, so that the attendees learn by hands-on wrangling with data for interesting inferences using scikit-learn (scipy, numpy) & pandas


An introductory hands-on workshop, aimed at the Amateur Data Scientists among us, to Data Science competitions. First we will quickly look at the classes of algorithms & their internals through simple competition problems & datasets. Next we will dig deeper into two Kaggle completions work through the datasets, choosing the right features, hands-on programming the appropriate algorithms & submitting a few entries. While a three hour tutorial cannot give full justice to all the Data Science Algorithms, the attendees will get a good idea of the top 5 algorithms and a chance to apply two or three of them on Kaggle competition data via guided hands-on programming. We will look at competitions like the Facebook recruiting challenge, GE Flight optimization competition and StumbleUpon Classification Challenge. _Note : 1. Depending the state of the competitions during April, we might have to make changes on what competitions we will work on during the hands on portion. Kaggle has kept older competitions open for submission only. There might more interesting competitions during April timeframe. 2. While there is not enough time for the participants to work through the different datasets, we will provide links to a hands-on tutorial which you'all can do after the workshop._

Student Handout

No handouts have been provided yet for this tutorial