Lra: A long read aligner for sequences and contigs.

Jingwen Ren,Mark J P Chaisson

doi:10.1371/journal.pcbi.1009078

Jingwen Ren, Mark J P Chaisson

Open Access

https://doi.org/10.1371/journal.pcbi.1009078

Copy DOI

Journal: PLOS Computational Biology	Publication Date: Jun 21, 2021
Citations: 69	License type: CC BY 4.0

Affiliation: University of Southern California

Abstract

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).

Highlights

Studies of genetic variation often begin by aligning sequences from a sample back to a reference genome, and inferring variation as differences in the alignment
Long-read single-molecule sequencing has been shown to help discover structural variation because the reads span across the entire variant
We demonstrate a method, lra, that uses an efficient implementation of concave-cost alignment for structural variant discovery using long reads

Summary

Introduction

Studies of genetic variation often begin by aligning sequences from a sample back to a reference genome, and inferring variation as differences in the alignment. The two technologies that produce LRS technologies, Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) generate reads over 50kb at error rate 15% or less. Aligning these sequences is a computationally challenging task for which several methods are available including minimap, ngmlr, and BLASR [1,2,3]. They are demonstrated to be quite fast and accurate, but have limitations, when there are large sequence differences between the read and the reference. This problem is amplified in complex, repetitive regions such as variable-number tandem repeats, that only make up 3% of the human genome, but account for nearly 70% of observed structural variation: insertions and deletions at least 50 bases (SV), and in larger SV [4]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Lra: A long read aligner for sequences and contigs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Lra: A long read aligner for sequences and contigs
Ferhat Ay ... Jian Ma
-
Ferhat Ay, et. al.Ferhat Ay ... Jian Ma
21 Jun 2021
21 Jun 2021

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
Mark J Chaisson ... Glenn Tesler
BMC Bioinformatics | VOL. 13
Mark J Chaisson, et. al.Mark J Chaisson ... Glenn Tesler
19 Sep 2012
BMC Bioinformatics | VOL. 13

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads.
Chuan-Le Xiao ... Yue Han
Nature Methods | VOL. 14
Chuan-Le Xiao, et. al.Chuan-Le Xiao ... Yue Han
18 Sep 2017
Nature Methods | VOL. 14

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing
Peter Edge ... Vikas Bansal
Nature Communications | VOL. 10
Peter Edge, et. al.Peter Edge ... Vikas Bansal
11 Oct 2019
Nature Communications | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lra: A long read aligner for sequences and contigs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology