Nowadays, we are surrounded by devices that can listen to us: Alexa, Siri, Cortana, etc, and the interaction with them has become easier and easier, and more intuitive. The first challenge to communicate in a colloquial way with all these devices is to convert the voice signal to text. To do this, several approaches based on searching methods, algorithmic techniques, and machine learning are combined in very smart and interesting ways.
In this talk, I will introduce the underneath speech recognition systems that these devices utilize. This will be illustrated with a guided example where we will develop a system to recognize isolated words in Python.
Finally, I will show how we are implementing these and more advanced techniques in our production systems, providing transcriptions for different companies and institutions, using Python in different parts of the process.