Abstract

Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates.

Highlights

  • The set of absent words of a sequence is the set of all words that cannot be found in the sequence

  • We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry, in addition to lineage specific inheritance, from the similarity in dinucleotide compositional biases of different sets of minimal absent words

  • We find that this hypothesis needs revision for longer absent words, as neither the base nor dinucleotide compositional biases are uniform throughout sets of minimal absent words of increasing word length

Read more

Summary

Introduction

The set of absent words of a sequence is the set of all words that cannot be found in the sequence. This set is too large and of limited interest for practical purposes. We have introduced the concept of minimal absent words that have the following property: the new word formed by removing the left- or rightmost character from a minimal absent word is no longer an absent word [1]. Minimal absent words are defined to have at least 3 characters and have been computed in genomes of organisms from all domains of life. The core of a minimal absent word, i.e. the word that remains if its left- and rightmost characters are removed, is a maximal repeat. A maximal repeat is a perfect repeat (without gaps or misspellings) that occurs at least twice and which cannot be further extended to either its left- or right-end side without loss of similarity

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call