Consuming HTML (#78)

Ian Bicking (The Open Planning Project)
30min Beginner
categories: html, web

HTML or XML? One is for humans, the other for computers. But with new libraries HTML can be processed nearly as reliably and quickly as XML, and developers don't always have to choose machine or humans. Consuming HTML also opens up all the content on the web. If you've been using regular expressions or the HTMLParser module to process HTML, this will also show you more robust and easier techniques.
 
This talk will demonstrate ways to manipulate HTML in Python, particularly using lxml.html but also showing examples using BeautifulSoup and html5lib.


Files:




# Permalink