Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data.

Na Hong,Feichen Shen,Sunghwan Sohn,Guoqian Jiang,Andrew Wen,Chen Wang,Hongfang Liu

doi:10.1093/jamiaopen/ooz056

Na Hong, Feichen Shen + Show 5 more

Open Access

https://doi.org/10.1093/jamiaopen/ooz056

Copy DOI

Journal: JAMIA open	Publication Date: Oct 18, 2019
Citations: 38	License type: CC BY-NC 4.0

Affiliation: Mayo Clinic

Abstract

ObjectiveTo design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification.MethodsWe established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects.ResultsA total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory).ConclusionWe demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.

Full Text