Alphabet-independent algorithms for finding context-sensitive repeats in linear time

Enno Ohlebusch,Timo Beller

doi:10.1016/j.jda.2015.05.005

Enno Ohlebusch, Timo Beller

Open Access

https://doi.org/10.1016/j.jda.2015.05.005

Copy DOI

Journal: Journal of Discrete Algorithms	Publication Date: May 26, 2015
Citations: 6	License type: publisher-specific-oa

Affiliation: University of Ulm

Abstract

The identification of repetitive sequences (repeats) is an essential component of genome sequence analysis, and there are dozens of algorithms that search for exact or approximate repeats. The notions of maximal and supermaximal (exact) repeats have received special attention, and it is possible to simultaneously compute them on index data structures like the suffix tree or the enhanced suffix array. Very recently, this research has been extended in two directions. Gallé and Tealdi [10] devised an alphabet-independent linear-time algorithm that finds all context-diverse repeats (which subsume maximal and supermaximal repeats as special cases), while Taillefer and Miller [31] gave a quadratic-time algorithm that simultaneously computes and classifies maximal, near-supermaximal, and supermaximal repeats. In this paper, we provide new alphabet-independent linear-time algorithms for both tasks.

Full Text