PyCon 2019 in Cleveland, Ohio

Sunday 10 a.m.–1 p.m. in Expo Hall

Designing Robust Machine Learning Algorithm to Detect Rare Events Using PyTorch

Haque Ishfaq, Atia Amin, Hassan Saad Ifti, Hassan Sami Adnan, Samara Sharmeen


Imagine you are an oncologist specialized in breast cancer. After hearing about how all the recent advancement in artificial neural network and machine learning, is revolutionizing medical diagnostics, you decided to try out a machine learning system that can tell which type of breast cancer the patient has just by looking into her breast mammogram images. As you encounter a new patient, with all your excitement, you use the new machine learning system to see which type of breast cancer the patient has. The algorithm says it's a benign case. Out of curiosity you take a second look at the mammogram image yourself and your years of medical training instantly tells you that the algorithm must be wrong. It cannot be benign but at the same time you are not sure which type of cancer it is. After digging up, you realize that the machine learning algorithm was trained only using the 10 most common types of breast cancer and your patient turns out to have a very rare type of breast cancer that you seldom encounter. Will you ever trust a machine learning system again to diagnose cancer? For high-stake application like this, the usual classification based machine learning algorithms are not enough. Instead we need a method that can learn high quality low dimensional representation of the data where we can achieve accurate clustering of different classes including for the classes for which we do not have any training data. This way the rare type of breast cancer we mentioned earlier would form its own cluster in the learned representation space and we would automatically be able to differentiate it from the other common types of cancer. To achieve this, in this project, we develop a generative model which would be able to learn latent representation space under which points coming from the same class are near each other and points coming from separate classes are far apart. We develop a novel loss function for training Variational Autoencoder (VAE) based generative models. The novel loss function exploits ideas from metric learning literature where instead of maximizing classification accuracy, neural networks are trained to map images coming from the same class to same regions in the learned latent representation space. Using our new VAE model, we can learn low dimensional latent representation for complex data that captures intra-class variance and inter-class similarities. The ability to learn such high quality low dimensional representation for any data would reduce any complex classification problem to simple clustering problem. All our experiments in this project were carried out using Python and its different libraries. In particular we make extensive use of PyTorch, a Python based Deep Learning framework. We believe that our approach can benefit diverse communities attending PyCon who are looking for ways to integrate machine learning algorithms to solve similar tasks that our approach is designed to tackle. In our poster, we will showcase the relevant Python tools one could use to reproduce our experiments and tackle similar tasks in their domains.