Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

Wing-Kai Hon,Wing-Kin Sung,Kunihiko Sadakane

doi:10.1137/070685373

Abstract

Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. In the literature, the fastest algorithm runs in $O(n)$ time, while it requires $O(n\log n)$-bit working space, where n denotes the length of the text. On the other hand, the most space-efficient algorithm requires $O(n)$-bit working space while it runs in $O(n\log n)$ time. It was open whether these indices can be constructed in both $o(n\log n)$ time and $o(n\log n)$-bit working space. This paper breaks the above time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array, which takes $O(n)$ time and $O(n)$-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires $O(n\log^{\epsilon}n)$ time and $O(n)$-bit working space for any $0<\epsilon<1$. Apart from that, our algorithm can also be adopted to build other existing full-text indices, such as compressed suffix tree, compressed suffix arrays, and FM-index. We also study the general case where the size of the alphabet $\Sigma$ is not constant. Our algorithm can construct a suffix array and a suffix tree using optimal $O(n\log|\Sigma|)$-bit working space while running in $O(n\log\log|\Sigma|)$ time and $O(n(\log^{\epsilon}n+\log|\Sigma|))$ time, respectively. These are the first algorithms that achieve $o(n\log n)$ time with optimal working space. Moreover, for the special case where $\log|\Sigma|=O((\log\log n)^{1-\epsilon})$, we can speed up our suffix array construction algorithm to the optimal $O(n)$.

Full Text