SNA techniques are derived from sociological and social-psychological theories and take into account the whole network (or, in case of very large networks such as Twitter -- a large segment of the network). Thus, we may arrive at results that may seem counter-intuitive -- e.g. that Justin Bieber (7.5 mil. followers) and Lady Gaga (7.2 mil. followers) have relatively little actual influence despite their celebrity status -- while a middle-of-the-road blogger with 30K followers is able to generate tweets that "go viral" and result in millions of impressions.
In this tutorial, we will conduct social network analysis of a real dataset, from gathering and cleaning data to analysis and visualization of results. We will use Python and a set of open-source libraries, including NetworkX, NumPy and Matplotlib.
Outline:
Introduction. Why should we do this? What is the data like? Why is this different from other techniques? What can we learn?
Centralities: Degree, closeness, betweenness, PageRank, Klout Score
Beyond Klout Score: Finding communities of interest, finding clusters in networks
Information diffusion in networks -- how do things go viral?