Improving readability of online content by removing abusive speech using Python

Adyasha Maharana, Abhinav Gupta


This poster presents our work in Python that involves building a deep learning model to detect abusive textual content and implementing the model as a web filter during online browsing. ### Abstract Imagine you are scrolling down your twitter feed, your favorite Reddit channel or comments on one of your Facebook posts and you come across a rather distasteful comment. This sparks an online war on the post or prevents a constructive discussion from taking place. Either way, you wish you didn't have to see the comment in the first place. Our web content is plagued with cheap, misinformed and hate speech which is a cause of mental harassment for many people. Is it possible to ban it? Well, it supposedly violates the notion of Freedom of Speech. What can we do then? We can filter it. That’s what we do when we don’t like ads. Taking a stab at it, we are using Python to build a deep learning framework that can detect abusive textual content as well as understand the readability of content on web and then extend its utility through browser extensions. This poster presents our work in Python that involves building a Recurrent Neural Networks (RNNs) with annotated datasets from Kaggle and Twitter. The task of detecting hate speech has its own eccentricities, such as: + identifying abusive speech based on certain traits ('fatso') + learning to recognize masked insults ('d!ck') + detecting sarcastic insults We describe how the machine learning model works out solutions to such tasks. ###Data Sources: + Kaggle + Crowdflower (Twitter) ###Python Packages Used + TensorFlow + NLTK