Linear work suffix array construction

Juha Kärkkäinen,Peter Sanders,Stefan Burkhardt

doi:10.1145/1217856.1217858

Juha Kärkkäinen, Peter Sanders + Show 1 more

https://doi.org/10.1145/1217856.1217858

Copy DOI

Abstract

Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple linear-time construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover . This view leads to a generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space--time tradeoff. For any v ∈ [1, √n ], it runs in O( vn ) time using O( n / √v ) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREW-PRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.

Full Text