Linear-time computation of minimal absent words using suffix array.

Carl Barton,Laurent Mouchard,Alice Heliou,Solon P Pissis

doi:10.1186/s12859-014-0388-9

Abstract

BackgroundAn absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an ntttttttmathcal {O}(n)ntttttt-time and ntttttttmathcal {O}(n)ntttttt-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an ntttttttmathcal {O}(n^{2})ntttttt-time and ntttttttmathcal {O}(n)ntttttt-space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available.ResultsOur contribution in this article is twofold: first, we bridge this unpleasant gap by presenting an ntttttttmathcal {O}(n)ntttttt-time and ntttttttmathcal {O}(n)ntttttt-space algorithm for computing all minimal absent words based on the construction of suffix arrays; and second, we provide the respective implementation of this algorithm. Experimental results, using real and synthetic data, show that this implementation outperforms the one by Pinho et al. The open-source code of our implementation is freely available at http://github.com/solonas13/maw.ConclusionsClassical notions for sequence comparison are increasingly being replaced by other similarity measures that refer to the composition of sequences in terms of their constituent patterns. One such measure is the minimal absent words. In this article, we present a new linear-time and linear-space algorithm for the computation of minimal absent words based on the suffix array.

Highlights

An absent word of a word y of length n is a word that does not occur in y
An O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata was presented in [7]
If we find a couple (a, b), a, b ∈, such that aw and wb occur in y, but awb does not occur in y, we can conclude that awb is a minimal absent word of y

Summary

Introduction

An absent word of a word y of length n is a word that does not occur in y. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al, 1998). No implementation of this algorithm is publicly available. There exists an O(n2)-time and O(n)-space algorithm for the same problem based on the construction of suffix arrays (Pinho et al, 2009) An implementation of this algorithm was provided by the authors and is currently the fastest available. Standard notions are gradually being complemented (or even supplanted) by other measures that refer, implicitly or explicitly, to the composition of sequences in terms of their constituent patterns. Noting the words which do occur in one Barton et al BMC Bioinformatics (2014) 15:388 sequence but do not occur in another can be used to detect mutations or other biologically significant events

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2014
Citations: 49	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Linear-time computation of minimal absent words using suffix array.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Parallelising the Computation of Minimal Absent Words
Carl Barton ... Laurent Mouchard
-
Carl Barton, et. al.Carl Barton ... Laurent Mouchard
01 Jan 2015
01 Jan 2015

EmMAW: computing minimal absent words in external memory.
Alice Héliou ... Inanc Birol
Bioinformatics (Oxford, England) | VOL. 33
Alice Héliou, et. al.Alice Héliou ... Inanc Birol
12 Apr 2017
Bioinformatics (Oxford, England) | VOL. 33

Minimal Absent Words in Prokaryotic and Eukaryotic Genomes
Sara P Garcia ... Christian Schönbach
PLoS ONE | VOL. 6
Sara P Garcia, et. al.Sara P Garcia ... Christian Schönbach
31 Jan 2011
PLoS ONE | VOL. 6

On finding minimal absent words
Armando J Pinho ... Sara P Garcia
BMC Bioinformatics | VOL. 10
Armando J Pinho, et. al.Armando J Pinho ... Sara P Garcia
08 May 2009
BMC Bioinformatics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Linear-time computation of minimal absent words using suffix array.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics