Abstract

Tetrapods, unlike other organisms, have multimodal spectra of k-mers in their genomes

Highlights

  • The empirical frequencies of DNA k-mers in whole genome sequences provide an interesting perspective on genomic complexity, and the availability of large segments of genomic sequence from many organisms means that analysis of k-mers with non-trivial lengths is possible

  • The distribution of DNA k-mers (DNA 'words' of length k) namely, the k-mer spectrum - in whole genome sequences provides an interesting perspective on the complexity of the corresponding species

  • The CpG-containing k-mers wholly occupy the left-most areas, but are almost absent from the regions representing more abundant k-mers. There is no such clear effect for the species with unimodal distributions. These findings suggest the hypothesis that CpG suppression is what determines modality

Read more

Summary

Introduction

The empirical frequencies of DNA k-mers in whole genome sequences provide an interesting perspective on genomic complexity, and the availability of large segments of genomic sequence from many organisms means that analysis of k-mers with non-trivial lengths is possible. The distribution of DNA k-mers (DNA 'words' of length k) namely, the k-mer spectrum - in whole genome sequences provides an interesting perspective on the complexity of the corresponding species. A number of theoretical investigations of genomic k-mer distributions were done prior to the sequencing of large genomes and these works suggested various plausible probabilistic models and parameters for such kmer distributions. Despite the relative abundance of sequenced genomes to date, the number of works investigating empirical k-mer distributions for values of k exceeding 2 or 3 is not very large. Reinert et al [2] discussed various plausible k-mer distributions, showing that the distribution for the number of occurrences of a particular k-mer in sequences generated from a hidden Markov model has two distinct large-sample regimes: a normal distribution for abundant k-mers, and a Poisson or compound Poisson distribution for extremely rare k-mers

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.