Abstract

BackgroundMolecular evolutionary studies in mammals often estimate nucleotide substitution rates within and outside CpG dinucleotides separately. Frequently, in alignments of two sequences, the division of sites into CpG and non-CpG classes is based simply on the presence or absence of a CpG dinucleotide in either sequence, a procedure that we refer to as CpG/non-CpG assignment. Although it likely that this procedure is biased, it is generally assumed that the bias is negligible if species are very closely related.ResultsUsing simulations of DNA sequence evolution we show that assignment of the ancestral CpG state based on the simple presence/absence of the CpG dinucleotide can seriously bias estimates of the substitution rate, because many true non-CpG changes are misassigned as CpG. Paradoxically, this bias is most severe between closely related species, because a minimum of two substitutions are required to misassign a true ancestral CpG site as non-CpG whereas only a single substitution is required to misassign a true ancestral non-CpG site as CpG in a two branch tree. We also show that CpG misassignment bias differentially affects fourfold degenerate and noncoding sites due to differences in base composition such that fourfold degenerate sites can appear to be evolving more slowly than noncoding sites. We demonstrate that the effects predicted by our simulations occur in a real evolutionary setting by comparing substitution rates estimated from human-chimp coding and intronic sequence using CpG/non-CpG assignment with estimates derived from a method that is largely free from bias.ConclusionOur study demonstrates that a common method of assigning sites into CpG and non CpG classes in pairwise alignments is seriously biased and recommends against the adoption of ad hoc methods of ancestral state assignment.

Highlights

  • Molecular evolutionary studies in mammals often estimate nucleotide substitution rates within and outside CpG dinucleotides separately

  • For an ancestral CpG site to be assigned as non-CpG requires the destruction of the CpG site in both derived lineages, necessitating a minimum of two changes across the tree (Figure 1a)

  • For an ancestral non-CpG site to be assigned as CpG requires only a single change (Figure 2b). The former will occur much less often in closely related species than the latter

Read more

Summary

Introduction

Molecular evolutionary studies in mammals often estimate nucleotide substitution rates within and outside CpG dinucleotides separately. Some studies that employed this assignment procedure in the analysis of protein-coding sequence have suggested that while the overall rate of substitution is higher at synonymous sites, both CpG and non-CpG synonymous substitution rates are substantially lower than substitution rates in noncoding DNA [10,14]. This has been interpreted as evidence of purifying selection at synonymous sites

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call