Abstract

BackgroundMassively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results.ResultsTo address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts.ConclusionsBy taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

Highlights

  • Parallel DNA sequencing generates staggering amounts of data

  • While data-management and information technologies have adapted to the processing and storage requirements of emerging sequencing technologies, it is less certain that appropriate informative software interfaces will be made available to the genomics and clinical genetics communities

  • The first personal next generation sequencing (NGS) genome was published in 2007 [4], and today we estimate that the number of available exomes and genomes approaches one hundred thousand

Read more

Summary

Introduction

Parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. Large numbers of laboratories and hospitals routinely generate terabytes of NGS data, shifting the bottleneck in clinical genetics from DNA sequence production to DNA sequence analysis Such analysis is most prevalent in three common settings: first, in a clinical diagnostics laboratory (e.g. Baylor’s Whole Genome Laboratory www.bcm.edu/geneticlabs/) testing single patients or families with presumed heritable disease; second, in a cancer-analysis setting where the unit. As these new samples are sequenced, the resulting data is most effectively examined in the context of petabases of existing DNA sequence and the associated meta-data. One element bridging the technology gap between the sequencing instrument and the scientist or clinician is a validated data processing pipeline that takes raw sequencing reads and produces an annotated personal genome ready for further analysis and clinical interpretation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call