MONI: A Pangenomic Index for Finding Maximal Exact Matches.

Massimiliano Rossi,Ben Langmead,Christina Boucher,Marco Oliva,Travis Gagie

doi:10.1089/cmb.2021.0290

Abstract

Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to build the r-index efficiently via a technique called prefix-free parsing (PFP) and demonstrated its effectiveness for exact pattern matching. Exact pattern matching can be leveraged to support approximate pattern matching, but the r-index itself cannot support efficiently popular and important queries such as finding maximal exact matches (MEMs). To address this shortcoming, Bannai et al. introduced the concept of thresholds, and showed that storing them together with the r-index enables efficient MEM finding-but they did not say how to find those thresholds. We present a novel algorithm that applies PFP to build the r-index and find the thresholds simultaneously and in linear time and space with respect to the size of the prefix-free parse. Our implementation called can rapidly find MEMs between reads and large-sequence collections of highly repetitive sequences. Compared with other read aligners-PuffAligner, Bowtie2, BWA-MEM, and CHIC- MONI used 2-11 times less memory and was 2-32 times faster for index construction. Moreover, MONI was less than one thousandth the size of competing indexes for large collections of human chromosomes. Thus, MONI represents a major advance in our ability to perform MEM finding against very large collections of related references.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MONI: A Pangenomic Index for Finding Maximal Exact Matches.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: Jan 17, 2022
Citations: 36

Similar Papers

Building scalable indexes that can be efficiently queried
Christina Boucher
-
Christina BoucherChristina Boucher
01 May 2022
01 May 2022

Finding maximal exact matches in graphs
Nicola Rizzo ... Veli Mäkinen
Algorithms for Molecular Biology | VOL. 19
Nicola Rizzo, et. al.Nicola Rizzo ... Veli Mäkinen
11 Mar 2024
Algorithms for Molecular Biology | VOL. 19

Computing MEMs and Relatives on Repetitive Text Collections
Gonzalo Navarro
ACM Transactions on Algorithms | VOL. -
Gonzalo NavarroGonzalo Navarro
25 Oct 2024
ACM Transactions on Algorithms | VOL. -

Practical distributed computation of maximal exact matches in the cloud
Sondos Seif El-Din ... Mohamed Aboelhoda
-
Sondos Seif El-Din, et. al.Sondos Seif El-Din ... Mohamed Aboelhoda
01 Jun 2014
01 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MONI: A Pangenomic Index for Finding Maximal Exact Matches.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology