Abstract

Analysis of DNA composition at several length scales constitutes the bulk of many early studies aimed at unravelling the complexity of the organization and functionality of genomes. Dinucleotide relative abundances are considered an idiosyncratic feature of genomes, regarded as a ‘genomic signature’. Motivated by this finding, we introduce the ‘Generalized Genomic Signatures’ (GGSs), composed of over- and under-abundances of all oligonucleotides of a given length, thus filtering out compositional trends and neighbour preferences at any shorter range. Previous works on alignment-free genomic comparisons mostly rely on k-mer frequencies and not on distance-dependent neighbour preferences. Therein, nucleotide composition and proximity preferences are combined, while in the present work they are strictly separated, focusing uniquely on neighbour relationships. GGSs retain the potential or even outperform genomic signatures defined at the dinucleotide level in distinguishing between taxonomic subdivisions of bacteria, and can be more effectively implemented in microbial phylogenetic reconstruction. Moreover, we compare DNA sequences from the human genome corresponding to protein coding segments, conserved non-coding elements and non-functional DNA stretches. These classes of sequences have distinctive GGSs according to their genomic role and degree of conservation. Overall, GGSs constitute a trait characteristic of the evolutionary origin and functionality of different genomic segments.

Highlights

  • Samuel Karlin and co-workers[1,2] introduced the notion of the ‘genomic signature’, i.e. a vector composed by the ‘relative abundances’ (odds ratios) of dinucleotides

  • In two pioneering works, Samuel Karlin and co-workers[1,2] introduced the notion of the ‘genomic signature’, i.e. a vector composed by the ‘relative abundances’ of dinucleotides

  • Different chromosomes of the same genome have very similar Generalized Genomics Signatures’ (GGSs) compared to chromosomes belonging to different species. Taking into account this analysis along with the phylogenetic reconstruction we presented in the previous section, it can be argued that GGSs exhibit strong intra-genome stability while at the same time their inter-genome variability suffice to distinguish between different bacteria

Read more

Summary

Introduction

Samuel Karlin and co-workers[1,2] introduced the notion of the ‘genomic signature’, i.e. a vector composed by the ‘relative abundances’ (odds ratios) of dinucleotides. We perform clustering using the GGSs as feature vectors in order to assess their potential in intragenomic comparisons among sequences of different functionality. We consider the elements of each vector as explanatory variables in a series of classification experiments that we perform in order to determine whether GGSs suffice to predict the phylum or class in which the corresponding bacterial species belong.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.