Abstract

BackgroundAnalysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies.ResultsWe describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 109 down to 105 bp.ConclusionThe new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space. This enables us to perform this type of analysis not only for the largest genomes available so far, but also for the emerging pan- and meta-genome data.

Highlights

  • Analysis of sequence composition is a routine task in genome research

  • We describe a new algorithm and software for the computation of absent words

  • Sequence statistics and unique substrings Word statistics is a traditional field of genome research

Read more

Summary

Methodology article

Address: 1Center of Biotechnology, Bielefeld University, Postfach 10 01 31, 33501 Bielefeld, Germany and 2Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany. Published: 26 March 2008 BMC Bioinformatics 2008, 9:167 doi:10.1186/1471-2105-9-167

Results
Conclusion
Background
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call