Maintaining over 15 years of publication data for the NASA Astrobiology Institute using Python

Shige Abe

Audience level:
Best Practices & Patterns


The NASA Astrobiology Institute was established in 1998. It collects list of publications as part of its annual reporting process. To date, there are at least 10,000 publication citations. Each of these need to be validated especially when used for bibliometric analysis. This poster will show we finally solved this problem in 2014 using Python and free web services.


The NASA Astrobiology Institute has been in existence since 1998. It collects list of publications from its member teams as part of the annual reporting process. Various attempts have been made over the years to maintain and standardize the meta data for the publications in the database. Most attempts have required the hiring of part-time librarians to manually review and clean up records. There have also been many years where the data was not reviewed or cleaned at all. As the database grew to around 10,000 records, we knew the majority of the records have not been examined. We also knew that this database would be of limited use for any bibliometric analysis until the records are cleaned and verified. This poster will describe the challenges, the scope of the problem and the Python-based workflow that was developed during the summer of 2014 to continuously update and verify the publication citations in our database.
