Change the future

Next-Generation Immunobiology Data Integration, Analysis and Visualization

Alan Barber II

Audience level:
Intermediate
Category:
Industry Uses

Description

Nodality has pioneered a novel flow cytometry-based technology in the areas of oncology and autoimmunity to reveal underlying disease biology. We present a custom framework written in Python that uses Django, Matplotlib, MongoDB and Pandas to join this experimental data with clinical facts such as individual patient disease outcomes to develop actionable biological and clinical information.

Abstract

Nodality is a biotechnology startup applying a novel flow cytometry-based technology called Single Cell Network Profiling (SCNP) in the areas of oncology and autoimmunity to reveal underlying disease biology. SCNP is used to characterize individual patients with the aim of selecting optimal, individualized treatment strategies. For every patient sample we collect the response of every cell of multiple types to a variety of molecular stimuli; these responses are measured on multiple readouts representing key biological signaling pathways. We present a custom framework written in Python that uses Django, Matplotlib, MongoDB and Pandas to join our experimental data with clinical facts such as individual patient disease outcomes to develop actionable biological and clinical information along with the respective visualizations.

The data frame concept, popularized by the S-PLUS and R statistical software packages, is used ubiquitously in many scientific fields to support visualization, machine learning, statistical analysis because it is a flexible tool for integrating disparate data for analysis. Recently, the Python package pandas has implemented a feature-rich and performant data frame object. Using the Pandas data frame object we have developed a Python-based software framework that serves as a foundation to join, prepare and pivot experimental and clinical data into a single, focused python object. In this framework we use pymongo and MongoDB to manage data frame persistence. The resulting software is a simple, scalable solution which is both easily exposed to web applications and leveraged in offline analysis. Finally, we provide internal access to data and static (Matplotlib) and dynamic (d3.js) visualizations using a custom Django web application. Using this framework, we are able to rapidly integrate and analyze highly-dimensional complex biological data.

This solution is a marked improvement over previous workflows that used flat files and a variety of software packages and languages to analyze and visualize data. Standardizing into a single language, Python, and a common pipeline improves efficiency and simplifies data curation and persistence. We will show examples of data integration, modeling and visualization workflows.