A fast algorithm for the all-pairs suffix–prefix problem

Jihyuk Lim,Kunsoo Park

doi:10.1016/j.tcs.2017.07.013

Jihyuk Lim, Kunsoo Park

Open Access

https://doi.org/10.1016/j.tcs.2017.07.013

Copy DOI

Journal: Theoretical Computer Science	Publication Date: Jul 25, 2017
Citations: 4	License type: elsevier-specific: oa user license

Affiliation: Seoul National University

Abstract

The all-pairs suffix–prefix problem occurs as a subproblem of DNA sequence assembly where it is the most time-consuming part of the whole assembly. Although there are algorithms for the all-pairs suffix–prefix problem which are optimal in the asymptotic time complexity, they are slower than SOF and Readjoiner which are state-of-the-art algorithms used in practice. In this paper we present an algorithm for the all-pairs suffix–prefix problem that uses a simple data structure for storing input strings and advanced algorithmic techniques for matching, which together lead to fast running time in practice. Our algorithm is 14 times faster than SOF and 18 times faster than Readjoiner on average in real datasets and random datasets.

Full Text