Abstract

As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical FHIR® data, from initial data to tertiary analysis. The electronic health records are stored in FHIR® (Fast Healthcare Interoperability Resource) format, the current leading standard for healthcare data exchange. For the genomic data, we perform variant calling on long-read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We include three example applications: exporting patient information to a database, clustering patients and performing a simple pharmacogenomic study. https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics. Supplementary data are available at Bioinformatics Advances online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call