Abstract

A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications.

Highlights

  • In the post genomic era a growing number of genomes has been completely sequenced and made available

  • We extended the computation of human genome nullomers to high order nullomers, and we computed nullomers and high order nullomers of random sequences preserving nucleotide and dinucleotide frequencies

  • In this paper we investigated the nature of nullomers, trying to address the question: is it just a statistical matter or is it the consequence of the peculiar features of genomic sequences? In this context we proposed an extension of the notion of nullomer introducing high order nullomers, i.e. nullomers such that each of their mutated sequences is still a nullomer

Read more

Summary

Introduction

In the post genomic era a growing number of genomes has been completely sequenced and made available. As a great number of genomes was made available, scientists started to study and compare genome features in terms of similarity, complexity, information content and statistical properties. In recent years the term “nullomer” was used for the first time to indicate an absent word of a given genomic sequence or of a collection of sequences [4]; further investigations were conducted in [5, 6]. We compared the sets of simple (i.e., zero order nullomers; for the sake of simplicity and when there is no ambiguity we refer to zero order nullomers as “nullomers”) and high order nullomers of the human genome, with those expected in random sequences, preserving nucleotide and dinucleotide frequencies. We investigated the nature of simple and high order nullomers, studying their peculiar patterns in terms of both dinucleotide composition and physical chemical properties. We built phylogenetic trees using simple and high order nullomers; the consistence of those trees with respect to classical phylogeny revealed that nullomers are well conserved among close species

Materials and Methods
Results
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.