PyCon 2019 in Cleveland, Ohio

Sunday 10 a.m.–1 p.m. in Expo Hall

Exploring Scientific Databases with Python

Andrey Smelter


Some of the existing scientific databases provide scholarly data deposited in a specialized file format. This is due to various reasons, for example the database and file format were developed prior to modern open data serialization formats and languages, poor design practices (not invented here principle). As a result, this prevents the end user from ease of access to the scientific information and full utilization of the valuable data. Therefore, data reusability for downstream analysis and knowledge integration by the scientific community are hindered. The poster will discuss - The issues of scientific data reusability and reproducibility. - Examples of the scientific databases that use specialized file format for data distribution. - Examples of open source Python libraries designed to work with databases that use specialized file format to distribute scientific data. - Examples of how this data can be converted (serialized) and potentially validated using modern open data serialization formats and Python libraries designed for schema validation. - Examples of using Jupyter, pandas, matplotlib for data exploration, data quality assurance, and data visualization.