Abstract

We examined seventy million well-characterized human mutations, and their impact on G+C-compositional dynamics, in order to understand the formation and maintenance of major genomic nucleotide sequence patterns. Among novel mutations, those that change a strong (S) base pair G:C/C:G to a weak (W) pair A:T/T:A occur at nearly twice the frequency of the opposite mutations. Such imbalance puts strong downward pressure on overall GC-content. However, along protracted paths to fixation, S→W mutations are much less likely to propagate than W→S mutations. The magnitude of relative propagation disadvantages for S→W mutations is inexplicable by any currently-accepted model. This fact forced us to re-examine the quantitative features of Biased Gene Conversion (BGC) theory. Revised parameters of BGC that, per average individual, convert 7–14 W base pairs into S pairs, would account for the S-content turnover differences between new and old mutations, and make BGC an instrumental force for nucleotide dynamics and evolution. BGC should thus be considered seriously in both theories and biomedical practice. In particular, BGC should be taken into account during allele imputations, where missing SNP alleles are computationally predicted based on the information about several neighboring alleles. Finally, we analyzed the effect of neighboring nucleotide context on the mutation frequencies, dynamics, and GC-composition turnover. For this purpose, we examined genomic regions having extremely biased nucleotide compositions (enriched for S-, W-, purine/pyrimidine strand asymmetry, or AC/GT-strand asymmetry). It was found that point mutations in these regions preferentially degrade the nucleotide inhomogeneities, decreasing the sequence biases. Degradation of sequence bias is highest for novel mutations, and considerably lower for older mutations (those widespread across populations). Besides BGC, there may be additional, still uncharacterized molecular mechanisms that either preserve genomic regions with biased nucleotide compositions from mutational degradation or fail to degrade such inhomogeneities in specific chromosomal regions.

Highlights

  • GC-composition along mammalian chromosomes is highly uneven

  • As we showed in this paper, in humans novel mutations that replace G:C base pairs with A: T pairs are a third more frequent than the opposite mutations

  • The normalized mutation ratio N(S!W)/(W!S) = 1.92 per site for the whole genome allows us to calculate the number of G:C pairs lost by brand new de novo mutations per individual

Read more

Summary

Introduction

There are regions highly enriched for G and C nucleotides, and other regions in which these nucleotides are strongly under-represented. This non-randomness has an intricate genomic structure and has been reviewed in many articles [1,2,3]. Regions with a particular GC-content may expand over millions of base pairs (known as isochores), or occupy only a few hundreds of base pairs and represent small islands inside regions with different GC-composition (some CpG-islands may exemplify the latter cases)[4]. Exon-intron structure, and functioning of genes strongly depend on GC-content of chromosomal region they occupy [3, 5]. Dramatic variations of GC-content are described in prokaryotes [8, 9] and various eukaryotic species [10, 11]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call