Friday 11:30 a.m.–noon
"Words, words, words": Reading Shakespeare with Python
- Audience level:
- Best Practices & Patterns
This talk will give an introduction to text analysis with Python by asking some questions about Shakespeare and discussing the quantitative methods that will go in to answering them. While we’ll use Shakespeare to illustrate our methodologies, we’ll also discuss how they can be ported over into more 21st century texts, like tweets or New York Times articles.
Shakespeare is the greatest writer in the English language. But what makes him perfect for an illustration of what Python can do with text? Simple: he’s already been extensively marked up in XML, so we can jump right into the good stuff. While we’ll be mostly using Shakespeare in this talk, we’ll make sure to see how our techniques can easily apply to other sorts of texts, like tweets or newspaper articles. After a brief conversation about the usefulness of metadata, the talk will concern itself with two main sections: classification and informational entropy. First, we’ll explore how to find distinguishing features between kinds of texts and to use this data to classify other texts. For entropy--which is roughly a measure of randomness--we’ll look at unexpected, and therefore information-rich, parts of Shakespeare’s works.