Breaking the -Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees.

Dominik Kempa,Tomasz Kociumaka

doi:10.1137/1.9781611977554.ch187

Abstract

The suffix array, describing the lexicographical order of suffixes of a given text, and the suffix tree, a path-compressed trie of all suffixes, are the two most fundamental data structures for string processing, with plethora of applications in data compression, bioinformatics, and information retrieval. For a length- text, however, they use bits of space, which is often too costly. To address this, Grossi and Vitter [STOC 2000] and, independently, Ferragina and Manzini [FOCS 2000] introduced space-efficient versions of the suffix array, known as the compressed suffix array (CSA) and the FM-index. Sadakane [SODA 2002] then showed how to augment them to obtain the compressed suffix tree (CST). For a length- text over an alphabet of size , these structures use only bits. Nowadays, these structures are part of the standard toolbox: modern textbooks spend dozens of pages describing their applications, and they almost completely replaced suffix arrays and suffix trees in space-critical applications. The biggest remaining open question is how efficiently they can be constructed. After two decades, the fastest algorithms still run in time [Hon et al., FOCS 2003], which is factor away from the lower bound of (following from the necessity to read the input). In this paper, we make the first in 20 years improvement in for this problem by proposing a new compressed suffix array and a new compressed suffix tree which admit -time construction algorithms while matching the space bounds and the query times of the original CSA/CST and the FM-index. More precisely, our structures take bits, support SA queries and full suffix tree functionality in time per operation, and can be constructed in time using bits of working space. (For example, if , the construction time is .) We derive this result as a corollary from a much more general reduction: We prove that all parameters of a compressed suffix array/tree (query time, space, construction time, and construction working space) can essentially be reduced to those of a data structure answering new query types that we call prefix rank and prefix selection. Using the novel techniques, we also develop a new index for pattern matching.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Breaking the -Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms

Lead the way for us

Journal: Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms	Publication Date: Jan 1, 2023
Citations: 3

Similar Papers

Solving All-Pairs Suffix Prefix – Theory and Practice
Maan Haj Rachid ... Qutaibah Malluhi
-
Maan Haj Rachid, et. al.Maan Haj Rachid ... Qutaibah Malluhi
01 Jan 2015
01 Jan 2015

A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space
Dong Kyue Kim ... Heejin Park
-
Dong Kyue Kim, et. al.Dong Kyue Kim ... Heejin Park
01 Jan 2004
01 Jan 2004

Space-efficient indexes for forbidden extension queries
Sudip Biswas ... Sharma V Thankachan
Journal of Discrete Algorithms | VOL. 50
Sudip Biswas, et. al.Sudip Biswas ... Sharma V Thankachan
01 May 2018
Journal of Discrete Algorithms | VOL. 50

An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size
Dong Kyue Kim ... Jeong Eun Jeon
-
Dong Kyue Kim, et. al.Dong Kyue Kim ... Jeong Eun Jeon
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Breaking the -Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms