Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

German Tischler

doi:10.1007/978-3-319-46049-9_18

Abstract

The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n bits plus asymptotically negligible support data structures. External memory construction algorithms for the LCP array have been proposed, but those proposed so far have a space requirement of O(n) words (i.e. \(O(n \log n)\) bits) in external memory. This space requirement is in some practical cases prohibitively expensive. We present an external memory algorithm for constructing the 2n bit version of the LCP array which uses \(O(n \log \sigma )\) bits of additional space in external memory when given a (compressed) BWT with alphabet size \(\sigma \) and a sampled inverse suffix array at sampling rate \(O(\log n)\). This is often a significant space gain in practice where \(\sigma \) is usually much smaller than n or even constant. The algorithm has average run-time \(O(n\log n\log \sigma )\) and worst case run-time \(O(n^2\log \sigma )\). It can be improved to \(O(n\log ^2 n\log \sigma )\) worst case time while keeping the same space bound in external memory if \(O(n / \log n)\) bits of internal memory are available. We also present experimental data showing that our approach is practical.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

String Inference from Longest-Common-Prefix Array
...
-
, et. al. ...
31 Jan 2018
31 Jan 2018

String inference from longest-common-prefix array
Juha Kärkkäinen ... Simon J Puglisi
Theoretical Computer Science | VOL. 942
Juha Kärkkäinen, et. al.Juha Kärkkäinen ... Simon J Puglisi
01 Dec 2022
Theoretical Computer Science | VOL. 942

Space-Time Tradeoffs for Longest-Common-Prefix Array Computation
Simon J Puglisi ... Andrew Turpin
-
Simon J Puglisi, et. al.Simon J Puglisi ... Andrew Turpin
01 Jan 2008
01 Jan 2008

External memory BWT and LCP computation for sequence collections with applications
Lavinia Egidi ... Guilherme P Telles
Algorithms for Molecular Biology | VOL. 14
Lavinia Egidi, et. al.Lavinia Egidi ... Guilherme P Telles
08 Mar 2019
Algorithms for Molecular Biology | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

Abstract

Talk to us

Similar Papers