Project Gado: Building an Open Archival Scanning Robot Using Python and Arduino

Type:
Talk
Audience level:
Intermediate
Category:
Other
March 10th 11:45 a.m. – 12:30 p.m.

Description

Project Gado is an initiative which aims to create an open-source archival scanning robot which small archives can purchase for $500 and use to autonomously scan their photographic collections. This talk presents the Gado 2, a prototype scanning robot built around Python and Arduino, and shares lessons learned from using Python as the primary language in a large-scale archival scanning project.

Abstract

The archives of the Afro American Newspaper in Baltimore MD contain over 1.5 million historical photos spanning 115 years of the city’s African American history. One of the largest Black history collections in the world, the Afro’s archives include thousands of photos which have never been seen by the public.

Why? Of the paper’s 1.5 million photos, only around 10,000 exist in a digital form; the Afro, like many small archives, simply does not have the human resources to manually digitize its collections. As a result, photos with incredible value for scholars, educators and community members alike are available only to the select few with the access, specialized skills, and time to travel to the physical archive and locate them.

Project Gado was founded in 2010 to address these challenges. The project seeks to create an open source archival scanning robot which small organizations like the Afro can use to autonomously digitize their photographic holdings. The Gado 1, a proof-of-concept machine built using Python and Arduino, has successfully scanned over 1,000 photos to date.

Gado 1

At present, Project Gado is developing the Gado 2 (pictured below as an early prototype), a second-generation machine which will cut scanning time by a factor of four, occupy a footprint half the size of the Gado 1’s, and require no specialized skills to assemble and operate. The project is also developing a photographic licensing site (launching May 2012) which will allow archival partners to generate a lasting revenue stream from their digital collections, creating an incentive for more small archives to adopt the Gado technology.

Early prototype of Gado 2

This talk will provide an overview of Project Gado and the Gado 2, and will address specific challenges faced and lessons learned from using Python as the primary language for an open robotics project and a major archival digitization initiative.

Technical topics covered will include Python and Arduino interfacing for machine control, Python/TWAIN integration, use of PIL and OpenCV for post-processing, and MySQL integration for image management and metadata annotation. These topics will be presented primarily in the context of a case study, rather than a tutorial; the main goal will be to show how Project Gado used these Python technologies to solve problems, and to demonstrate how the technologies could be used to solve similar problems in other cases.

The talk will conclude with a discussion of opportunities for interested developers to contribute to the Gado codebase, and for interested institutions to implement the Gado 2 in their own archives.

Outline:

  1. Brief overview of Project Gado
  2. The Hardware
    Overview of the Gado 2
    Machine demo
  3. The Gado Codebase: Design strategies, Problems faced, Modules used
    Interfacing with Arduino
    Scanning with TWAIN
    Capturing metadata and performing OCR with Tesseract
    OpenCV and PIL for automatic post-processing
    Collection management with MySQL
  4. Challenges, Pythonic solutions, next steps
  5. Get involved!
    Opportunities for developers
    Options for partner archives/organizations
    Pieces of the codebase with relevance to other problems/projects and how to steal them