Machine Learning and Data Science code has its own set of challenges and peculiarities. When we write code to be used by Data Scientists or Machine Learning Developers we have to keep in mind constantly that every abstraction we use has to a) be compatible with a fast and easy exploration playground; b) allow for sensible checkpoints and optimizations; c) implement in a declarative fashion repeated queries and functions; and d) provide an abstraction level over all of the production code so it can be tracked and monitored seamlessly.
In this talk we will provide general guidelines to approach this problem from a software engineering perspective, defining what should our entities be, how deep should our abstraction go and how to avoid some usual design pitfalls. We will apply all these guidelines to a specific and small end to end problem.