Abstract

BackgroundCpGs, the major methylation sites in vertebrate genomes, exhibit a high mutation rate from the methylated form of CpG to TpG/CpA and, therefore, influence the evolution of genome composition. However, the quantitative effects of CpG to TpG/CpA mutations on the evolution of genome composition in terms of the dinucleotide frequencies/proportions remain poorly understood.ResultsBased on the neutral theory of molecular evolution, we propose a methylation-driven model (MDM) that allows predicting the changes in frequencies/proportions of the 16 dinucleotides and in the GC content of a genome given the known number of CpG to TpG/CpA mutations. The application of MDM to the 10 published vertebrate genomes shows that, for most of the 16 dinucleotides and the GC content, a good consistency is achieved between the predicted and observed trends of changes in the frequencies and content relative to the assumed initial values, and that the model performs better on the mammalian genomes than it does on the lower-vertebrate genomes. The model’s performance depends on the genome composition characteristics, the assumed initial state of the genome, and the estimated parameters, one or more of which are responsible for the different application effects on the mammalian and lower-vertebrate genomes and for the large deviations of the predicted frequencies of a few dinucleotides from their observed frequencies.ConclusionsDespite certain limitations of the current model, the successful application to the higher-vertebrate (mammalian) genomes witnesses its potential for facilitating studies aimed at understanding the role of methylation in driving the evolution of genome dinucleotide composition.

Highlights

  • CpGs, the major methylation sites in vertebrate genomes, exhibit a high mutation rate from the methylated form of CpG to TpG/CpA and, influence the evolution of genome composition

  • For the 10 vertebrate genomes, the statistical results of the frequencies/proportions (%) of the 16 dinucleotides and the GC content are listed in Table 1, the corresponding expected values obtained by application of methylation-driven model (MDM) to the initial genome state with 50% GC content and 6.25% proportion of each dinucleotide are listed in Table 2, and the application results for the other two initial genome states, i.e., with 40 and 60% GC contents, are shown in Supplementary Tables 1 and 2 (Additional file 1), respectively

  • Comparison between the expected and observed trends of frequency/proportion changes relative to the initial proportions reveals that, when the initial genomes have a GC content of 50% and proportion for each dinucleotide of 6.25%, most of the 16 dinucleotides in most studied genomes have a good consistency between the expected and observed changing trends. This indicates that, on the one hand, 50% GC content could be a rational assumption for the initial state of vertebrate genomes, and on the other hand, our model can achieve a good performance in predicting the changing trends in frequencies of most dinucleotides caused by the methylation-induced CpG to TpG/CpA mutations

Read more

Summary

Introduction

CpGs, the major methylation sites in vertebrate genomes, exhibit a high mutation rate from the methylated form of CpG to TpG/CpA and, influence the evolution of genome composition. It has been shown that in the vertebrate genomes, the CpG dinucleotide is present at a lower frequency than expected [11,12,13,14]. The reason for this is thought to be due to a high C-to-T mutation rate at the methylated CpG sites [14,15,16,17,18].

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call