Abstract

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep’s parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.

Highlights

  • With elPrep 4 it is possible to execute all preparation steps recommended by the GATK Best Practices [3] for variant calling, but it can be used for implementing other types of pipelines

  • Examples of filters include operations to remove unmapped reads, or remove reads based on genomic regions, but we have shown that more complex operations such as duplicate marking can be expressed as filters [1]

  • Conclusions elPrep 4 is a reimplementation of the elPrep framework [1] for processing sequence alignment map files (SAM/BAM) in the Go programming language. It introduces new and improved functionality for sorting, optical duplicate marking, base quality score recalibration, MultiQCcompatible metrics, and various filtering options. This allows elPrep to process most of the preparation pipelines defined by the GATK Best Practices [3], and other types of pipelines [7]

Read more

Summary

Objectives

When we reimplement a tool from GATK 4, Picard, or SAMtools, our goal is to come up with a new algorithm that takes advantage of elPrep’s parallel architecture, yet does not change the semantics of the original algorithm

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.