Abstract

This article presents an O ( n )-time algorithm called SACA-K for sorting the suffixes of an input string T [0, n -1] over an alphabet A [0, K -1]. The problem of sorting the suffixes of T is also known as constructing the suffix array (SA) for T . The theoretical memory usage of SACA-K is n log K + n log n + K log n bits. Moreover, we also have a practical implementation for SACA-K that uses n bytes + ( n + 256) words and is suitable for strings over any alphabet up to full ASCII, where a word is log n bits. In our experiment, SACA-K outperforms SA-IS that was previously the most time- and space-efficient linear-time SA construction algorithm (SACA). SACA-K is around 33% faster and uses a smaller deterministic workspace of K words, where the workspace is the space needed beyond the input string and the output SA. Given K = O (1), SACA-K runs in linear time and O (1) workspace. To the best of our knowledge, such a result is the first reported in the literature with a practical source code publicly available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call