Lightweight LCP construction for very large collections of strings

Anthony J Cox,Fabio Garofalo,Giovanna Rosone,Marinella Sciortino

doi:10.1016/j.jda.2016.03.003

Anthony J Cox, Fabio Garofalo + Show 2 more

Open Access

https://doi.org/10.1016/j.jda.2016.03.003

Copy DOI

Abstract

The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows–Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from “next-generation” DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows–Wheeler transform of a very large collection of strings having any length. The computation is realized by performing disk data accesses only via sequential scans, and the total disk space usage never needs more than twice the output size, excluding the disk space required for the input. Moreover, extLCP allows to compute also the suffix array of the strings of the collection, without any other further data structure is needed. Finally, we test our algorithm on real data and compare our results with another tool capable to work in external memory on large collections of strings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Discrete Algorithms	Publication Date: Mar 1, 2016
Citations: 32	License type: elsevier-specific: oa user license

R Discovery Prime

R Discovery Prime

Lightweight LCP construction for very large collections of strings

Abstract

Talk to us

Similar Papers

More From: Journal of Discrete Algorithms

Lead the way for us

Similar Papers

Noncanonical Gene Fusions Detected at the DNA Level Necessitate Orthogonal Diagnosis Methods Before Targeted Therapy.
Zhengbo Song ... Chun-Wei Xu
Journal of Thoracic Oncology | VOL. 16
Zhengbo Song, et. al.Zhengbo Song ... Chun-Wei Xu
01 Mar 2021
Journal of Thoracic Oncology | VOL. 16

Random Access in Persistent Strings and Segment Selection
Philip Bille ... Inge Li Gørtz
Theory of Computing Systems | VOL. 67
Philip Bille, et. al.Philip Bille ... Inge Li Gørtz
17 Dec 2022
Theory of Computing Systems | VOL. 67

External Memory Generalized Suffix and LCP Arrays Construction
Felipe A Louza ... Cristina Dutra De Aguiar Ciferri
-
Felipe A Louza, et. al.Felipe A Louza ... Cristina Dutra De Aguiar Ciferri
01 Jan 2013
01 Jan 2013

Solving All-Pairs Suffix Prefix – Theory and Practice
Maan Haj Rachid ... Qutaibah Malluhi
-
Maan Haj Rachid, et. al.Maan Haj Rachid ... Qutaibah Malluhi
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lightweight LCP construction for very large collections of strings

Abstract

Talk to us

Similar Papers

More From: Journal of Discrete Algorithms