Abstract

Mapping of reads to reference sequences is an essential step in a wide range of biological studies. The large size of datasets generated with next-generation sequencing technologies motivates the development of fast mapping software. Here, I describe URMAP, a new read mapping algorithm. URMAP is an order of magnitude faster than BWA with comparable accuracy on several validation tests. On a Genome in a Bottle (GIAB) variant calling test with 30× coverage 2×150 reads, URMAP achieves high accuracy (precision 0.998, sensitivity 0.982 and F-measure 0.990) with the strelka2 caller. However, GIAB reference variants are shown to be biased against repetitive regions which are difficult to map and may therefore pose an unrealistically easy challenge to read mappers and variant callers.

Highlights

  • BackgroundNext-generation sequencing has enabled dramatic advances in fields ranging from human functional genomics (Morozova & Marra, 2008) to microbial metagenomics (Gilbert & Dupont, 2011)

  • When first utilized in read mapping, Burrows-Wheeler Transform (BWT) had the important advantage that it creates a compact index with size comparable to the reference database

  • BWA, URMAP, SNAP and Bowtie2, stand out as more accurate than the others (Minimap2, Hisat2, URMAPv and FSVA) because all methods from the first group have at least 6 better metrics with a positive mean improvement compared to all methods in the second group with the exception of SNAP >5(3.4) URMAPv

Read more

Summary

Introduction

Next-generation sequencing has enabled dramatic advances in fields ranging from human functional genomics (Morozova & Marra, 2008) to microbial metagenomics (Gilbert & Dupont, 2011). Data analysis in next-generation studies often requires mapping of reads to a reference database such as a human genome, human exome, or a collection of full-length microbial genomes. For a given query sequence (read), the primary goal of mapping is to report the best match if possible, otherwise to report that the best two or more alignments are sufficiently similar to each other that the best match is ambiguous. When first utilized in read mapping, BWT had the important advantage that it creates a compact index with size comparable to the reference database. This is ∼3 GB, which is small enough to be stored in RAM with

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.