PyCon Pittsburgh. April 15-23, 2020.

Sponsor Workshop: DocuSign: Python + Machine Learning: Implementing a deep-learning based document tagging solution

Presented by:

Raphael Alabi, PhD, Matthew Roknich

Description

We report that, in collaboration with Google last year, we created the fastest document tagging solution and successfully deployed it to our production environment. The solution involved using computer vision (CV) in addition to state-of-the-art natural language processing (NLP) algorithms. The CV algorithm enabled us to identify potential tag locations within a document as well as the words contained within the document; while LSTM/BERT based NLP algorithm enabled the correct differentiation of the tags into signature, date and text tags. The architecture is very flexible as it can accommodate more tag differentiations via tuning of the LSTM/BERT models. The architecture is also scalable since inference can be run on multiple GPU’s thus enabling a faster millisecond response from the NLP/CV models.