PyCon Pittsburgh. April 15-23, 2020.

Talk: Developing Python Libraries for Machine Learning: Best Practices and Lessons Learned

Presented by:

Yue Zhao

Description

The popularity of open-source machine learning (ML) libraries such as scikit-learn and tensorflow speaks to the value of well-developed software in both academia and industry. However, the need to extend these tools is becoming increasingly important as general-purpose libraries are often hard to adapt for specific use-cases. For example, an accessible Python library for anomaly detection—an important sub-field of machine learning—did not exist until 2017. Possible reasons for this are that it is intimidating to build a machine learning library from the ground-up and resources on the subject are scarce. These issues hinder the advancement of the field and prevent practitioners from applying state-of-the-art research to practical problems.

In this talk, we will discuss best practices and the lessons learned through our experience in designing and releasing three popular Python machine learning libraries, which have been used by thousands of people and published in top venues including the Journal of Machine Learning Research and AAAI Conference on Artificial Intelligence . Specifically, we will focus on three perspectives: extensibility, API design, and scalability. Our goal is to encourage researchers and practitioners to build reliable “wheels” for accelerating machine learning research and development. Audiences are expected to learn: i) the standard procedure for building ML libraries from scratch and ii) best practices to design their own toolboxes.