Change the future

Thursday 1:20 p.m.–4:40 p.m.

Applied Parallel Computing with Python

Minesh B Amin, Ian Ozsvald

Audience level:
Experienced
Category:
High Performance Computing

Description

In this tutorial we shall review three different and distinct approaches to parallel computing which can be used to solve problems in all manner of domains, including machine learning, natural language processing, finance, and computer vision. The first two approaches to be reviewed will be embarrassingly parallel in nature while the third approach will leverage fine-grain parallelism.

Abstract

Perhaps Gregory Pfister said it best in this book, In Search of Clusters. To paraphrase, there are three ways to do anything faster: work harder, work smarter or get help. In computer-speak, this roughly translates to: increase processor speed, improve algorithms or exploit parallelism. With processor speeds no longer doubling every eighteen months and little or no room left for improvements in serial algorithms, exploiting parallelism is the one frontier with the potential for delivering huge improvements in performance. In this tutorial we shall review three different and distinct approaches to parallel computing which can be used to solve problems in all manner of domains, including machine learning, natural language processing, finance, and computer vision. The first two approaches to be reviewed will be embarrassingly parallel in nature while the third approach will leverage fine-grain parallelism.

Goals

At the conclusion of the tutorial, the audience will posses a conceptual understanding of not only the what/why/how on general purpose parallelism, but also have a much better appreciation of the tradeoffs involved when exploiting three specific forms of parallelism.

Prerequisites

All examples will leverage open source packages. However, given the rather large number of package dependencies and the size of datasets, we will provide a small VirtualBox image using Ubuntu for host OS Windows, OS X and Linux with prerequisites pre-installed. A link to the download location as well as instructions for validating the setup will be provided two weeks before the tutorial. We encourage attendees to try out the VirtualBox image as soon as possible.

Course Plan


[30 mins] Introduction to Parallel Computing

  • Background on the need for parallelism and review of the parallel landscape given the rise of multi-core architectures on cheap hardware

  • Basic terminology on parallelism that not only accounts for the moment but transcends the moment; i.e. terminology that can be used to better understand both existing and proposed parallel solutions.

  • Survey of different and distinct forms of parallelism: inter-node (i.e. across servers), and intra-node (i.e. within each server)

Applied Parallel Computing in Python

[30 mins] Basic setup

  • Parallel logging infrastructure for debugging, and review of cloud setup

[40 mins] Code review of "List of Tasks"

  • We will use Ian’s Mandelbrot example from previous tutorials to move from a single-threaded solver to use the multiprocessing module, and then ask the students to adapt the code to use the ParallelPython module (possibly demonstrated across several machines if the network allows)

[40 mins] Code review of MapReduce

  • We will review how to perform a word count across a large Twitter feed, and then ask the students to adapt the code to solve a second related problem.

[30 mins] Code review of parallel hyper-parameter optimization

  • We will review a parallel, asynchronous implementation on how to optimize the hyper-parameters of machine learning algorithms, and conclude by having the students run sample demos.

Notes


  • The VirtualBox will include the source code for the exercises and examples, and links to additional material on the web. The slides (in PDF format) will be provided at the beginning of the tutorial.
  • Attendees are encouraged to join the OpenSpace after the tutorial to discuss their own problems and to further review the code and share experiences.

Update: See updated tutorial preparation instructions at Applied Parallel Computing with Python - Essential VirtualBox