Change the future

Retrieving Meaning from Words

Nathaniel Case, Eitan Romanoff

Audience level:
Novice
Category:
Big Data

Description

FOSS@RIT is an applied research lab at Rochester Institute of Technology focused on promoting free/open source software and open web technologies. Recently, students have been using the Natural Language Toolkit (NLTK) libraries to solve interesting problems involving natural language; analyzing data from political tweets to personal emails.

Abstract

FOSS@RIT provides a wide range of opportunities and resources to capable and excited RIT students, connecting them to interesting projects with real-world implications. Recently, students have been using the Natural Language Toolkit (NLTK) libraries to solve interesting problems involving natural language; analyzing data from political tweets to personal emails.

One application that is being analyzed using natural language processing, and statistical data mining techniques is a collection of several million political tweets. Using the Raspberry Pi to run a Twython-powered scraper, these tweets were collected over the course of the presidential debates and on election night. Students working in FOSS@RIT apply these tools, aiming to build intelligent classifiers to help categorize and understand the interests and trends present in twitter messages on this specific domain.

Another interesting application involves a collection of emails donated by a local news outlet. In June 2012, a video appeared on the Internet of a bus monitor for a local school district being harassed by children on the bus. A subset of the resultant emails to the school district were made available to FOSS@RIT in the hopes that we could extract useful information about the contents. Using a simple classifier, features like sentiment of the message could be extracted.