Model Management Systems: Scikit-Learn and Django

Benjamin Bengfort, Laura Lorenz, Rebecca Bilbro

Description

Modern web applications encorporate machine learning models to create personalized, interesting, or even safer experiences for their users. From recommendations to troll detection, text summarization and automatic image captioning, machine learning is becoming a fixture of our experience on the Internet. However, while there are many tools for the administration of content (Django CMS) or the administration of an API (Swagger), machine learning model management systems are still custom software that must be created on a per-application basis. Employing a fitted model that was trained with Scikit-Learn is relatively easy: the model can be pickled then embedded into a REST API with Flask or the Django REST-Framework. Requests that contain data can be transformed, then the model can make a prediction which is returned to the front-end. However, as time goes on the model will either need to be retrained on new data or encorporate new information so as to be more predictive. Different models may be employed in an ensemble fashion or to evaluate different performance, models may be trained on different parts of the application or on a per-user basis. The end result is that a web application may have many hundreds of models, and if they are simply embeded with the code, they cannot be updated in real time (a new deployment is required). In this poster we introduce a new approach to machine learning in web applications: _model management systems_ (MMS). We present a Django app, similar to the django-admin app that allows for the storage, curation, and selection of Scikit-Learn models such that both data science efforts and users can interact with the machine learning capabilities of the system (similar to how editors and authors interact with content in a CMS). Model Management Systems are the next step to more easily allow many types of web and mobile applications to encorporate machine learning in meaningful ways. To illustrate this, we present a simple application, [Partisan Discourse](http://partisan-discourse.districtdatalabs.com/), that uses a model trained on the 2016 Presidential Campaign Debates to predict the political polarity of text. As users browse the web and read news articles, the application highlights articles and words that are "red" or "blue" indicating partisanship. Users can also submit their own suggestions for an article's political bias, and in so doing generate a personalized text classification model. Expert users (that is users who might have a professional need for such an application) can also create collective models. The result is a rich web application that allows many models and predictive interactions between the machine and different users. Without a model management system, such an application would not be possible!