Building A Python-Based Search Engine

Audience level:
March 11th 1:30 p.m. – 2:10 p.m.


Search is an increasingly common request in all types of applications as the amount of data all of us deal with continues to grow. The technology/architecture behind search engines is wildly different from what many developers expect. This talk will give a solid grounding in the fundamentals of providing search using Python to flesh out these concepts in a simple library.


  • Core concepts
  • Terminology
  • Document-based
  • Show basic starting code for a document
  • Inverted Index
  • Show a simple inverted index class
  • Stemming
  • N-gram
  • Show a tokenizer/n-gram processor
  • Fields
  • Show a document handler which ties it all together
  • Searching
  • Show a simple searcher (& the whole thing working together)
  • Faceting (likely no demo)
  • Boost (likely no demo)
  • More Like This
  • Wrap up
  • Point to the GitHub repo for the sample code