Improving debuggability of complex asyncio applications

The key of debugging is observability and reproducibility. Despite a series of the asyncio stdlib improvements for the last few years, it is still challenging to see what’s happening in complex real-world asyncio applications. Particularly, when multiple asyncio libraries and your codes are composed together, it is hard to track down silently swallowed cancellations and resource-hogging floods of tasks triggered by internals of 3rd-party callbacks. Moreoever, such misbehaviors are often observed only in production environments where the app faces the actual workloads and I/O patterns, making it even harder to reproduce.

In this talk, I present an improved version of aiomonitor, called aiomonitor-ng (next generation). The original aiomonitor provides a live access to a running asyncio process using a telnet socket and a basic REPL to inspect the list of tasks and their current stacks. After getting several times of assistance in production debugging with it, I have added more features to help tracking the above issues of asyncio apps running in production: task creation tracker and termination tracker. These trackers keeps the stack traces whenever a new task is created or terminated, and provides a holistic view of chained stack traces when the tasks are nested with arbitrary depths.

aiomonitor-ng also demonstrates a rich async TUI (terminal UI) based on prompt toolkit and Click, with auto-completion of commands and arguments, far enhancing the original version’s simple REPL.

With the improved aiomonitor-ng, I could successfully debug several production bugs. I hope this talk would help our fellow asyncio developers to make more complex yet stable applications at scale.