Abstract

The frequency of G and C nucleotides in genomes varies from species to species, and sometimes even between different genes in the same genome. The monocot grasses have a bimodal distribution of genic GC content absent in dicots. We categorized plant genes from 5 dicots and 4 monocot grasses by synteny to related species and determined that syntenic genes have significantly higher GC content than non-syntenic genes at their 5`-end in the third position within codons for all 9 species. Lower GC content is correlated with gene duplication, as lack of synteny to distantly related genomes is associated with past interspersed gene duplications. Two mutation types can account for biased GC content, mutation of methylated C to T and gene conversion from A to G. Gene conversion involves non-reciprocal exchanges between homologous alleles and is not detectable when the alleles are identical or heterozygous for presence-absence variation, both likely situations for genes duplicated to new loci. Gene duplication can cause production of siRNA which can induce targeted methylation, elevating mC→T mutations. Recently duplicated plant genes are more frequently methylated and less likely to undergo gene conversion, each of these factors synergistically creating a mutational environment favoring AT nucleotides. The syntenic genes with high GC content in the grasses compose a subset that have undergone few duplications, or for which duplicate copies were purged by selection. We propose a “biased gene duplication / biased mutation” (BDBM) model that may explain the origin and trajectory of the observed link between duplication and genic GC bias. The BDBM model is supported by empirical data based on joint analyses of 9 angiosperm species with their genes categorized by duplication status, GC content, methylation levels and functional classes.

Highlights

  • DNA encodes the complexity of life, but one of the simplest statistics of a DNA sequence, the fraction of G+C nucleotides (GC content), is highly variable across the tree of life

  • We propose to examine if some types of gene duplications are correlated with GC content in the 9 species involved

  • The traditional view that the primary advantage of gene duplication is initially increased protein production, possibly followed by eventual sub- or neo-functionalization leading to functional innovation, could by expanded if some new gene copies have immediately advantageous effects as epigenetic regulators

Read more

Summary

Introduction

DNA encodes the complexity of life, but one of the simplest statistics of a DNA sequence, the fraction of G+C nucleotides (GC content), is highly variable across the tree of life. The overall GC content of bacteria ranges from 13–75% [1] and eukaryotic nuclear genomes range from. GC content shows significant variation even within the same genome. In the human genome GC content of 20k base pair windows vary from 31– 65% [3]. Significant local variation in GC content has been noted in honeybees, as well [4]. Among Angiosperm plants, GC content shows an irregular or bimodal distribution in the genes of monocot grasses while most dicots exhibit a more normal distribution [5, 6]. Localized differences in GC frequency in Eukaryotes have many proposed causes [7, 8] but a scientific consensus has not been reached

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call