Stepping Through CPython

Larry Hastings

Type:: Talk
Audience level:: Experienced
Category:: Python Internals

March 9th 12:10 p.m. – 12:55 p.m.

Description

Ever wondered how CPython actually works internally? This talk will show you. We start with a simple Python program, then slowly step through CPython, showing in exhaustive detail what happens when it runs that program. Along the way we'll examine the design and implementation of various major CPython subsystems and see how they fit together. The audience should be conversant in C and Python.

Abstract

The goal of the talk is to sufficiently familiarize the audience with CPython's internal structure such that a programmer versed in C and Python but having never dealt with an interpreter would be able to comfortably dive in and start hacking on CPython.

The program examined will be simple but deliberately designed to exercise most of CPython's runtime behavior. This will include loading modules implemented in C and in Python, loading bytecode cached on disk, and a cross-section of bytecodes. (For example, I only need to examine one of the BINARY_* math operands, I don't need to walk through every single one.)

Areas I expect to examine: built-in modules, including ones that are automatically loaded before your program starts bytecode, including * the various implementations of the inner loop (switch statement, labels-as-values) * the peephole optimizer * on-disk format * marshal * the magic version number * mention lnotab but probably skip the gory details the stack machine * unwinding the stack after an exception (and producing tracebacks) * contrast CPython's approach with Stackless All the possible fields of PyObject, an overview of fields in PyType built-in types * the implementations of a few key internal types * list, dict, tuple, str, byte, int, bool, None * though not to the level of detail that Hettinger or Rhodes did in past talks * interned values the GIL and reference counting * weakrefs * garbage collection * Py_TRASHCAN CPython's small-block and arena allocators The parser, though I don't want to spend a lot of time on it (runtime is where the fun is ;) Internal utility functions like PyArg_Parse

I'll be giving the talk based on CPython 3.2.