Yellowbrick: Steering Scikit-Learn with Visual Transformers

Rebecca Bilbro, Benjamin Bengfort

Description

In machine learning, model selection is a bit more nuanced than simply picking the 'right' or 'wrong' algorithm. In practice, the workflow includes (1) selecting and/or engineering the smallest and most predictive feature set, (2) choosing a set of algorithms from a model family, and (3) tuning the algorithm hyperparameters to optimize performance. Recently, much of this workflow has been automated through grid search methods, standardized APIs, and GUI-based applications. In practice, however, human intuition and guidance can more effectively hone in on quality models than exhaustive search. This poster presents a new Python library, [Yellowbrick](https://pypi.python.org/pypi/yellowbrick/0.3.1), which extends the Scikit-Learn API with a visual transfomer (visualizer) that can incorporate visualizations of the model selection process into pipelines and modeling workflow. Visualizers enable machine learning practitioners to visually interpret the model selection process, steer workflows toward more predictive models, and avoid common pitfalls and traps. Yellowbrick is an open source, pure Python project that extends Scikit-Learn with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to create publication-ready figures and interactive data explorations while still allowing developers fine-grain control of figures. For users, Yellowbrick can help evaluate the performance, stability, and predictive value of machine learning models, and assist in diagnosing problems throughout the machine learning workflow. In this poster, we'll show not only what you can do with Yellowbrick, but how it works under the hood (since we're always looking for new contributors!). We'll illustrate how Yellowbrick extends the Scikit-Learn and Matplotlib APIs with a new core object: the Visualizer. Visualizers allow visual models to be fit and transformed as part of the Scikit-Learn Pipeline process - providing iterative visual diagnostics throughout the transformation of high dimensional data.