PyCon 2016 in Portland, Or
hills next to breadcrumb illustration

Monday 5:10 p.m.–5:40 p.m.

Visual Diagnostics for More Informed Machine Learning: Within and Beyond Scikit-Learn

Rebecca Bilbro

Audience level:


Visualization has a critical role to play throughout the analytic process. Where static outputs and tabular data may render patterns opaque, human visual analysis can uncover volumes and lead to more robust programming and better data products. For Python programmers who dabble in machine learning, visual diagnostics are a must-have for effective feature analysis, model selection, and evaluation.


Visual diagnostics are a powerful but frequently underestimated tool in data science. By tapping into one of our most essential resources — the human visual cortex — they can enable us to see patterns rendered opaque by numeric outputs and tabular data, and lead us toward more robust programs and better data products. For Python programmers who dabble in machine learning, visual diagnostics can mean the difference between a model that crashes and burns, and one that predicts the future. Python and high level libraries like Scikit-learn, NLTK, PyBrain, Theano, and MLPY have made machine learning accessible to a broad programming community that might never have found it otherwise. With the democratization of these tools, there are now a great many machine learning practitioners who are primarily self-taught. At the same time, the stakes of machine learning have never been higher; predictive tools are driving decision-making in every sector, from business, art, and engineering to education, law, and defense. In an age where any Python programmer can harness the power of predictive analytics, how do we ensure our models are valid and robust? How can we identify problems such as local minima and overfit? How can we build intuition around model selection? How can we isolate and combine the most informative features? Whether you have an academic background in predictive methods or not, visual diagnostics are the key to augmenting the algorithms already implemented in Python. In this talk, I present a suite of visualization tools within and outside the standard Scikit-learn library that Python programmers can use to evaluate their machine learning models' performance, stability, and predictive value. I then identify some of the key gaps in the current visual diagnostics arsenal, and propose some novel possibilities for the future.