Thursday 1:30 p.m.–3 p.m.

Anaconda: Accelerating your Python Data Science code with Dask and Numba

Ian Stokes-Rees

Description

Anyone doing numerical computing with Python will have run into performance barriers. Using Anaconda is a great start to get a suite of extension packages where the underlying data structures and algorithms are written in C or Fortan. We'll briefly review the state of numerical computing in Python, look at some examples to help you remember why you should use NumPy based packages whenever possible, and focus on two options for acceleration: faster serial computing or parallelization. Continuum Analytics has developed two popular open source packages to address these issues: Numba, which provides an LLVM-based JIT that can be easily accessed just through a decorator; and Dask, which provides a distributed computing framework and some high quality data structures that are similar to a Pandas DataFrame or a NumPy NDarray. Participants should have the latest release of Anaconda installed and have some familiarity with Python in order to follow along interactively with the tutorial where we'll learn how to efficiently leverage Dask and Numba.