Abstract

Abstract Whole Genome Sequencing (WGS) provides information for each base of the entire 3.2 billion base pairs of the diploid human genome. Therefore, WGS plays an important role in identifying genetic variations for populations and understanding disease signatures in cohort studies or cases with rare genetic disorders. Nonetheless, discoveries from high throughput WGS are dependent on efficient processing, analyzing, and storing this enormous amount of genomic sequencing data, often in the scale of petabytes. Although there has been a significant reduction in genome sequencing costs in recent years, high-performance computation costs have not decreased in a directly proportional fashion. The objective of the present work is to develop a Docker-based container method for human whole genome sequencing data processing and analysis for detecting genetic variations from paired end WGS short reads. Our method provides an approach to simultaneously process multiple genomes within a single compute system while guaranteeing sustained and stable handling of the memory requirements for the genomic data processing and ensuring no unwanted termination of the currently running parallel jobs. This method also achieves a 40 % reduction in execution time. To encourage widespread adoption and ease of WGS analysis, our containerized pipeline will be made publicly available. We have tested this approach for human genome data from Illumina WGS platforms and report the benchmark metrics in two different workstation environments in this communication. Compared to truth sets, our approach calls variants with 99 % precision and recall.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.