There are so many tools to do data science today that it can be difficult to navigate. Many of them are AI platforms that “do everything by clicking on a UI” and do not leverage pre-existing tools e.g., GIT for versioning, or good old python IDE instead of Jupyter Notebooks. On the other hand, ML engineering is not classical software engineering:
In this talk, we will build a fully customizable and complete system in python to track Machine Learning experiments. For the purpose of this talk, we will train a neural network (Tensorflow) to classify images between cat and dog, though, the main focus is on the tooling and not the ML algorithm. We will use:
Both DVC and Streamlit are open-source libraries with python APIs.
In the second part of the talk, we will focus on various ways of combining DVC and Streamlit. For instance, we will see how to build a Streamlit app that allows selecting any trained model tracked with DVC (provided its GIT commit), loading it, and testing it on given input images.
I will provide code samples and live demos throughout the talk.