PyCon Pittsburgh. April 15-23, 2020.

Talk: Python Performance: Past, Present and Future (PyPy, Cython, C API, subinterpreters, tracing GC)

Presented by:

Victor Stinner

Description

Many past optimization projects are now abandoned or stale for different reasons: Unladen Swallow, Pyston, Pyjion, Gilectomy, etc. Victor also experimented register-based bytecode and FAT Python which he failed to finish. We will see what these projects have achieved, but also try to understand why they didn’t complete. One common issue is the backward compatibility, especially the compatibility with C extensions.

Python now has a performance benchmark suite to track performance over time. There are mature solutions to optimize performance bottlenecks and works around the GIL limitation. PyPy is a drop-in replacement for CPython: it is way faster, fully compatible, and is now more efficient to handle C extensions (PyPy cpyext). Cython is a good compromise between speed and development time: it uses a syntax close to Python but emit faster machine code. multiprocessing makes easy to scale an application on multiple CPUs, and it supports shared memory since Python 3.8. asyncio is another approach to maximize CPU utilization using concurrency for I/O (ex: network and database connections).

The pickle has also been optimized in Python 3.8 (version 5) to reduce or even avoid memory copies. For scientific computation like numpy, numba and pythran can emit efficient code using SIMD instructions and GPGPU. There are also multiple on-going experimental projects. For example, the PEP 554 proposes to have multiple interpreter instances, called “sub-interpreters”, per process, and run them in parallel: no single process-wide lock, but one lock per interpreter. The C API used by C extensions is also being reworked to hide implementation details and provide better forward compatibility. In the long term, it may unlock many new optimizations in CPython, and it may even allow to use the same C extension binary for CPython and PyPy.