Abstract

BackgroundTandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment.ResultsCrochemore’s repetitions algorithm, also referred to as Crochemore’s partitioning algorithm, was introduced in 1981, and was the first optimal -time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore’s partitioning algorithm for weighted sequences, which requires optimal time, thus improving on the best known -time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

Highlights

  • Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies

  • In 1999, Kolpakov and Kucherov presented an O(n)-time algorithm to compute the most compact representation of all repetitions known as runs [2]

  • In the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies

Read more

Summary

Results

Crochemore’s repetitions algorithm, referred to as Crochemore’s partitioning algorithm, was introduced in 1981, and was the first optimal O(n log n)-time algorithm to compute all repetitions in a string of length n. We present a novel variant of Crochemore’s partitioning algorithm for weighted sequences, which requires optimal O(n log n) time, improving on the best known O n2 -time algorithm (Zhang et al, 2013) for computing all repetitions in a weighted sequence of length n

Background
Conclusions
Crochemore M
Benson G
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call