An improved algorithm for the all-pairs suffix–prefix problem

William H.A Tustumi,Simon Gog,Guilherme P Telles,Felipe A Louza

doi:10.1016/j.jda.2016.04.002

William H.A Tustumi, Simon Gog + Show 2 more

Open Access

https://doi.org/10.1016/j.jda.2016.04.002

Copy DOI

Abstract

Finding all longest suffix–prefix matches for a collection of strings is known as the all pairs suffix–prefix match problem and its main application is de novo genome assembly. This problem is well studied in stringology and has been solved optimally in 1992 by Gusfield et al. [8] using suffix trees. In 2010, Ohlebusch and Gog [13] proposed an alternative solution based on enhanced suffix arrays which has also optimal time complexity but is faster in practice. In this article, we present another optimal algorithm based on enhanced suffix arrays which further improves the practical running time. Our new solution solves the problem locally for each string, scanning the enhanced suffix array backwards to avoid the processing of suffixes that are no suffix–prefix matching candidates. In an empirical evaluation we observed that the new algorithm is over two times faster and more space-efficient than the method proposed by Ohlebusch and Gog. When compared to Readjoiner [5], a good practical solution, our algorithm is faster for small overlap lengths, in pace with theoretical optimality.

Full Text