PyCon 2016 in Portland, Or
hills next to breadcrumb illustration

Monday 2:35 p.m.–3:05 p.m.

Wrestling Python into LLVM Intermediate Representation

Anna Herlihy

Audience level:
Intermediate
Category:
Other

Description

The LLVM Project provides an intermediate representation (LLVM-IR) that can be compiled on many platforms. LLVM-IR is used by analytical frameworks to achieve language and platform independence. What if we could add Python to the long list of languages that can be translated to LLVM-IR? This talk will go through the steps of wrestling Python into LLVM-IR with a simple, static one-pass compiler.

Abstract

Motivation ========= What is LLVM-IR? ------------------------- The LLVM Compiler Infrastructure Project provides a convenient, transportable intermediate representation (LLVM-IR) which can be compiled and linked into multiple types of machine-dependent assembly code. What is so cool about LLVM-IR is that you can take any of the most popular languages and distill them into a totally transportable form that can then be sent out and run on all sorts of different machines. Once the code gets into IR it doesn’t matter what platform it was originally written on, and it doesn’t matter that Python can be super slow. It doesn’t matter if you have weird CPUs on your clusters, as long as they’re supported by LLVM you will be able to run your code. What is Tupleware? ---------------------------- TupleWare is an analytical framework built at Brown University that allows users to compile functions into distributed programs that are automatically deployed. There are plenty of distributed platforms out there, like Hadoop, Spark, etc. but not many that are both language and platform agnostic. In order to be independent of both language and machine, TupleWare uses LLVM-IR. Right now Tupleware does not support Python as a frontend language. Most engineers may prefer to write their machine learning algorithms in Python over C++, so the goal of this project is to make that possible. This talk will go through the steps of writing a comprehensive Python front-end for TupleWare with a focus on the construction of a compiler from a limited subset of Python to LLVM-IR. PyLLVM ======= What is PyLLVM? ------------------------ This is the heart of the talk. PyLLVM is a simple, easy to extend, one-pass static compiler that takes in the subset of Python most likely to be used by Tupleware. PyLLVM is based on an existing project called [py2llvm](https://code.google.com/p/py2llvm) that was abandoned around 2011. I’m going to go through some basic compiler design and talk about how some LLVM-IR features make our lives easier, and some much harder. We will cover types, scoping, memory management, and other implementation details. As I go through each feature of PyLLVM I’m going to talk about the various design decisions I made in the interest of simplicity, feasibility, and how to best meet our requirements. I’m going to explain some features that were implemented by the original author of py2llvm, and many that I did myself. I’ll get to share my ultimate favorite coding moment of my career. Related Work, Benchmarking, and Future Work ====================================== What makes PyLLVM great? ---------------------------------------- Last, I’m going to compare PyLLVM to Numba, a specializing just-in-time Python-to-LLVM compiler from Continuum Analytics. I’ll briefly compare the implementation details of Numba and PyLLVM and talk about how their goals differ. I’ll show you some performance benchmarking of PyLLVM compared to Numba and Clang. To conclude, I will touch on what the future has in store for PyLLVM.