Pfp-fm: an accelerated FM-index.

Aaron Hong,Marco Oliva,Dominik Köppl,Hideo Bannai,Christina Boucher,Travis Gagie

doi:10.1186/s13015-024-00260-8

Abstract

FM-indexes are crucial data structures in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [1] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. In 2022, Deng et al. [2] proposed parsing genomic data by induced suffix sorting, and showed that the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38, and is consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it seems our method accelerates the performance of count over all state-of-the-art methods with a moderate increase in the memory. The source code for is available at https://github.com/AaronHong1024/afm .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Apr 10, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Pfp-fm: an accelerated FM-index.

Abstract

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

MEDAL
Wenqin Huangfu ... Peng Gu
-
Wenqin Huangfu, et. al.Wenqin Huangfu ... Peng Gu
12 Oct 2019
12 Oct 2019

General Document Retrieval in Compact Space
Gonzalo Navarro ... Simon J Puglisi
ACM Journal of Experimental Algorithmics | VOL. 19
Gonzalo Navarro, et. al.Gonzalo Navarro ... Simon J Puglisi
07 Jan 2015
ACM Journal of Experimental Algorithmics | VOL. 19

PyMod: sequence similarity searches, multiple sequence-structure alignments, and homology modeling within PyMOL
Emanuele Bramucci ... Stefano Pascarella
BMC Bioinformatics | VOL. 13
Emanuele Bramucci, et. al.Emanuele Bramucci ... Stefano Pascarella
28 Mar 2012
BMC Bioinformatics | VOL. 13

A strategy for predicting gene functions from genome and metagenome sequences on the basis of oligopeptide frequency distance.
Takashi Abe ... Masaya Mizoguchi
Genes & genetic systems | VOL. 95
Takashi Abe, et. al.Takashi Abe ... Masaya Mizoguchi
01 Feb 2020
Genes & genetic systems | VOL. 95

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pfp-fm: an accelerated FM-index.

Abstract

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology