URMAP, an ultra-fast read mapper.

Robert Edgar

doi:10.7717/peerj.9338

Abstract

Mapping of reads to reference sequences is an essential step in a wide range of biological studies. The large size of datasets generated with next-generation sequencing technologies motivates the development of fast mapping software. Here, I describe URMAP, a new read mapping algorithm. URMAP is an order of magnitude faster than BWA with comparable accuracy on several validation tests. On a Genome in a Bottle (GIAB) variant calling test with 30× coverage 2×150 reads, URMAP achieves high accuracy (precision 0.998, sensitivity 0.982 and F-measure 0.990) with the strelka2 caller. However, GIAB reference variants are shown to be biased against repetitive regions which are difficult to map and may therefore pose an unrealistically easy challenge to read mappers and variant callers.

Highlights

BackgroundNext-generation sequencing has enabled dramatic advances in fields ranging from human functional genomics (Morozova & Marra, 2008) to microbial metagenomics (Gilbert & Dupont, 2011)
When first utilized in read mapping, Burrows-Wheeler Transform (BWT) had the important advantage that it creates a compact index with size comparable to the reference database
BWA, URMAP, SNAP and Bowtie2, stand out as more accurate than the others (Minimap2, Hisat2, URMAPv and FSVA) because all methods from the first group have at least 6 better metrics with a positive mean improvement compared to all methods in the second group with the exception of SNAP >5(3.4) URMAPv

Summary

Introduction

Next-generation sequencing has enabled dramatic advances in fields ranging from human functional genomics (Morozova & Marra, 2008) to microbial metagenomics (Gilbert & Dupont, 2011). Data analysis in next-generation studies often requires mapping of reads to a reference database such as a human genome, human exome, or a collection of full-length microbial genomes. For a given query sequence (read), the primary goal of mapping is to report the best match if possible, otherwise to report that the best two or more alignments are sufficiently similar to each other that the best match is ambiguous. When first utilized in read mapping, BWT had the important advantage that it creates a compact index with size comparable to the reference database. This is ∼3 GB, which is small enough to be stored in RAM with

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Jun 24, 2020
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

URMAP, an ultra-fast read mapper.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Abstract 876: Sequencing a new broadly-consented tumor/normal cell line for a Genome in a Bottle Benchmark
Gail Rosen ... Andrew Liss
Cancer Research | VOL. 83
Gail Rosen, et. al.Gail Rosen ... Andrew Liss
04 Apr 2023
Cancer Research | VOL. 83

Interplay between probe design and test performance: overlap between genomic regions of interest, capture regions and high quality reference calls influence performance of WES-based assays.
Erinija Pranckeviciene ... Lijia Huang
Human genetics | VOL. 140
Erinija Pranckeviciene, et. al.Erinija Pranckeviciene ... Lijia Huang
05 Jul 2020
Human genetics | VOL. 140

Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery
Yury A Barbitoff ... Alexander V Predeus
BMC genomics | VOL. 23
Yury A Barbitoff, et. al.Yury A Barbitoff ... Alexander V Predeus
22 Feb 2022
BMC genomics | VOL. 23

Variants detection in regions of segmental duplication facilitates molecular diagnosis for IEI such as chronic granulomatous disease caused by NCF1 mutations
Xingtian Yang ... Wanling Yang
Clinical Immunology | VOL. 250
Xingtian Yang, et. al.Xingtian Yang ... Wanling Yang
01 May 2023
Clinical Immunology | VOL. 250

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

URMAP, an ultra-fast read mapper.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ