Talks

From Messy Clinical Data to Interoperable FHIR: A Python-First Mapping and Validation Pipeline

Friday, May 15th, 2026 2:45 p.m.–3:15 p.m. in Room 103ABC

Presented by

Lisa Smith

Description

Healthcare data is rarely clean. In practice, it arrives as spreadsheets with inconsistent column names, free-text clinical descriptions, partial fields, and little agreement on structure. At the same time, modern analytics and data sharing increasingly require FHIR-compliant resources.

This talk presents a Python-first approach to bridging that gap. I’ll describe a production Flask application that allows users to upload CSV or Excel files, map columns to FHIR resources (Patient, Observation, Condition, Procedure, Medication, etc.), resolve terminology using LOINC, SNOMED CT, and RxNorm, and generate valid FHIR bundles without requiring users to understand FHIR itself. The focus is on architecture and design decisions rather than standards theory. We’ll look at how pandas is used for normalization and validation, how fuzzy matching (via RapidFuzz) is applied safely to clinical text, and how FHIR resources are constructed dynamically so that only available data is included. Special attention is paid to handling optional fields, extensions, and partial compliance without corrupting downstream data.

The system is intentionally database-agnostic and stateless; uploaded files are validated and processed in memory, with no persistent file storage, providing fast, interactive feedback. This keeps the deployment lightweight while reducing security and compliance risk, constraints that significantly shaped the final design. Although the example domain is healthcare, the techniques discussed: schema mediation, fuzzy matching, user-guided validation, and resilient ingestion pipelines, are broadly applicable to any Python system that needs to turn messy, user-supplied data into structured, standardized output.

No prior FHIR knowledge is required.

Search