Abstract

We present elPrep 5, which updates the elPrep framework for processing sequencing alignment/map files with variant calling. elPrep 5 can now execute the full pipeline described by the GATK Best Practices for variant calling, which consists of PCR and optical duplicate marking, sorting by coordinate order, base quality score recalibration, and variant calling using the haplotype caller algorithm. elPrep 5 produces identical BAM and VCF output as GATK4 while significantly reducing the runtime by parallelizing and merging the execution of the pipeline steps. Our benchmarks show that elPrep 5 speeds up the runtime of the variant calling pipeline by a factor 8-16x on both whole-exome and whole-genome data while using the same hardware resources as GATK4. This makes elPrep 5 a suitable drop-in replacement for GATK4 when faster execution times are needed.

Highlights

  • ElPrep [1, 2] is an established software tool for analyzing aligned sequencing data

  • The new elPrep 5 release introduces variant calling based on the haplotype caller algorithm from GATK4 [6]

  • We present a number of additional benchmarks that discuss a variation of the GATK Best Practices pipeline for variant calling without base quality score recalibration

Read more

Summary

Introduction

ElPrep [1, 2] is an established software tool for analyzing aligned sequencing data. It focuses on supporting community-defined standards such as the sequence alignment/map file format (SAM/BAM) [3] and the GATK Best Practice pipelines [4, 5] for storing and analyzing sequencing data respectively. Previous versions of elPrep focus on preparation steps that prepare the data for statistical analysis by variant calling algorithms. This includes, among others, all of the preparation steps from the GATK Best Practice pipelines for variant calling [4], including steps for sorting reads, PCR and optical duplicate marking, and base quality score recalibration, and applying the result of base quality score recalibration. We always guarantee that the output elPrep produces for any step is identical to the output of the reference tool, for example GATK4, generates This creates additional complexity from the implementation side, leading us to develop multiple new algorithms [1, 2, 7].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call