Presentation: Oh, Come On. Who Needs Bytearrays?

Saturday 4:15 p.m.–5 p.m.

Oh, Come On. Who Needs Bytearrays?

Brandon Rhodes

Audience level:: Intermediate
Category:: Python Core (language, stdlib, etc.)

Description

Python string operations are profligate in their use of memory — the steps necessary to parse an HTTP request often make four or five copies of every incoming byte. But does it matter? This talk explores the “bytearray”, shows how its proper use dramatically reduces copying, but then uses metrics and visualizations to determine whether any increase in performance is worth the added complexity.

Abstract

Why does no one talk about the `bytearray`? With what great hopes was it added to the Python language? It might nearly be suspected of violating “there should be one obvious way to do it” by supporting a whole parallel ecosystem of shadowy techniques for parsing data and accelerating I/O while dodging normal Python strings. What do these techniques really accomplish? This talk will make practical and honest comparisons about the damage that `bytearray` techniques can do to your code as you contort it into ever more interesting shapes to try to minimize the number of times you copy data. After all, you reason, modern processors are usually starved for more data from RAM, and any technique that reduces the number of times string operations copy data into new regions of memory has got to be a win — right? Not nearly as often as you think. Through benchmarks and careful observation, we will learn when to identify those rare situations where sheer data copying is genuinely a limiting factor in your application’s performance — and how, in those situations, to pull out the `bytearray` and let it roll. Its effect in PyPy and Cython will be compared to our mental model of how it improves performance in vanilla C Python.