A cloud-based pipeline for analysis of FHIR and long-read data.

Tim Dunn,Erdal Cosgun,Cecilia Arighi

doi:10.1093/bioadv/vbac095

Tim Dunn, Erdal Cosgun + Show 1 more

Open Access

https://doi.org/10.1093/bioadv/vbac095

Copy DOI

Journal: Bioinformatics Advances	Publication Date: Jan 5, 2023
Citations: 2	License type: CC BY 4.0

Affiliation: University of Michigan–Ann Arbor

Abstract

As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical FHIR® data, from initial data to tertiary analysis. The electronic health records are stored in FHIR® (Fast Healthcare Interoperability Resource) format, the current leading standard for healthcare data exchange. For the genomic data, we perform variant calling on long-read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We include three example applications: exporting patient information to a database, clustering patients and performing a simple pharmacogenomic study. https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics. Supplementary data are available at Bioinformatics Advances online.

Full Text