Change the future

Mining Twitter Feeds

Vishnu Nath

Audience level:
Intermediate
Category:
Big Data

Description

A Twitter application that performs a wide variety of functions, ranging from the simple and direct function like posting a Tweet and reading personal tweets, to data mining actions like computing number of common friends and followers, a person's Twitter influence, currently trending topics on Twitter displayed in a fancy tag cloud, commonly related topics, retweeting analysis, etc.

Abstract

The application performs the simple operations of sending and receiving tweets directly, as well as complicated operations of mining user behaviors and patterns as well. The first major analysis is in the realm of re-tweets. The task of identifying if a tweet is a re-tweet or not is performed using regular expressions. The application also captures all the re-tweets from a designated geographical area within a particular time frame. The application draws acyclic directed graphs to explain this visually to any user. It also checks if a user regularly re-tweets other tweets of a particular user. If yes, then the application checks if both users are either friends or followers on Twitter. If not, it makes a suggestion that both of them must be friends or followers since they talk about similar topics. Also, the application can find out the market value of an entity on Twitter. It is not enough to have a huge following on Twitter, but rather it is important to have a huge active following on Twitter. This would help businesses sign endorsement deals from celebrities based on numerical analysis, and not just a few magazine articles. The way this works is by computing the number of friends and followers for the entity and checking for the number of mutual friends and followers. After this, it checks for the number of re-tweets that have emanated from this user. It also checks for the frequency of tweets wherein this person was mentioned. Using all this, it outputs a score which is basically the influence index. The higher the influence score, the better. The application also analyzes every tweet that the user has tweeted and forms a character profile of this user. While this may not be accurate, it would definitely give a good indication of the things that the user is interested in, at least publicly. Lastly, the application displays all the data in a visually appealing tag-cloud form. This entire application has been written in Python (ActivePython 2.7) and also uses the Redis service, along with the Numpy, re and twitter module.