Vargas: heuristic-free alignment for assessing linear and graph read aligners.

Charlotte A Darby,Ravi Gaddipati,Michael C Schatz,Ben Langmead

doi:10.1093/bioinformatics/btaa265

Abstract

MotivationRead alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score.ResultsVargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these ‘gold standard’ Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-maximal exact match and vg to align more reads correctly.Availability and implementationSource code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Biological gold standards such as the Platinum Genomes (Eberle et al, 2017), synthetic diploid (Li et al, 2018), and Genome in a Bottle (Zook et al, 2014) catalog the variants present in a genome and are used to benchmark variant calling algorithms on real sequencing data
We presented Vargas, a heuristic-free read alignment tool achieving extremely high multithreaded throughput
Read alignments produced by Vargas can be used as a computational gold standard for evaluating short-read alignment algorithms, including with real sequencing datasets, and in much the same way as biological gold standards are used to assess variant calling algorithms

Summary

Introduction

Biological gold standards such as the Platinum Genomes (Eberle et al, 2017), synthetic diploid (Li et al, 2018), and Genome in a Bottle (Zook et al, 2014) catalog the variants present in a genome and are used to benchmark variant calling algorithms on real sequencing data. For benchmarking and algorithm development, using gold standard call sets is more realistic than simulating sequencing reads from a synthetic genome with known variants. Read alignment algorithms, which determine a sequencing read’s point of origin with respect to a reference genome, are instead often evaluated using simulated sequencing reads due to the lack of a biological gold standard that directly answers questions about where sequencing reads should align.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Apr 22, 2020
Citations: 20	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Vargas: heuristic-free alignment for assessing linear and graph read aligners.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Introducing Variable Gap Penalties into Three-Sequence Alignment for Protein Sequences
Che-Lun Hung ... Chun-Yuan Lin
-
Che-Lun Hung, et. al.Che-Lun Hung ... Chun-Yuan Lin
01 Jan 2008
01 Jan 2008

Comparison of linear gap penalties and profile-based variable gap penalties in profile–profile alignments
Chuan Wang ... Ziding Zhang
Computational Biology and Chemistry | VOL. 35
Chuan Wang, et. al.Chuan Wang ... Ziding Zhang
22 Jul 2011
Computational Biology and Chemistry | VOL. 35

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
Michael L Sierk ... William R Pearson
BMC Bioinformatics | VOL. 11
Michael L Sierk, et. al.Michael L Sierk ... William R Pearson
22 Mar 2010
BMC Bioinformatics | VOL. 11

Reticular alignment: A progressive corner-cutting method for multiple sequence alignment
Adrienn Szabó ... Ádám Novák
BMC Bioinformatics | VOL. 11
Adrienn Szabó, et. al.Adrienn Szabó ... Ádám Novák
23 Nov 2010
BMC Bioinformatics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vargas: heuristic-free alignment for assessing linear and graph read aligners.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics