Sunday 10 a.m.–1 p.m. in

Building Reproducible Machine Learning Models with Python and Docker

Jeff Espenschied


A 2016 study by the journal _Nature_ found that 90% of working scientists surveyed think there is a "slight" or "significant" crisis in experimental reproducibility. For the working data scientist, the processes and models used in analysis must be able to be used to reproduce results or to be applied to new data. We ran into these same problems while building an API for developers to easily incorporate Machine Learning algorithms in their software. We were able to leverage Docker, Python, Flask, and Amazon S3 to enable the reuse of the exact model generated in the initial analysis. This poster will show how those pieces are put together and how you could create a similar system for your analysis.