Profiling is hard. Trying to understand what is making your system slow can be very frustrating. Specially when it happens only when your clients are looking, but not you.
Python comes with elaborate profiling tools, but understanding the output of profile/cProfile can be a daunting task, specially on complex frameworks, and these modules might be impractical in a production environment where their performance toll can make the system unusable.
But Python is a very reflexive language, allowing extensive investigation of its own state during runtime, and one of its lesser known tools is the sys._current_frames() function. It can be used to take an X-ray of what all the threads in your running Python program are doing. It can also be used for a kind of “statistical” profiling, with little impact on your running system.
In this talk we’ll investigate how this function can be used to tell what your program is doing, exactly at the moment when it is misbehaving. We’ll learn how looking at a series of tracebacks, instead of a bunch of calling statistics, can help zooming quickly into code hot spots.
We’ll also show some case studies of real world server-side slowdowns that were solved by the use of this technique which were caused by complex interactions of different components, including: