Abstract

We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.

Highlights

  • single nucleotide polymorphisms (SNPs) occurring in variably methylated DNA regions[11]

  • Analysis of genomic variation in rice was recently presented by McCouch et al.[26] (1,568 diverse inbred rice varieties analyzed at 700,000 SNPs), Huang et al.[27], Xu et al.[28], Duitama et al.[29], Arai-Kichise et al.[30], Jain et al.[2]

  • We provide new evidence on close association between local genetic variability and functionality, and suggest that the patterns of SNP density can help to detect functionally important genomic regions that are essential for crop improvement

Read more

Summary

Introduction

SNPs occurring in variably methylated DNA regions[11]. Nucleotide composition was found to be an important factor in SNP distribution[12,13,14,15], with a quadratic dependence of SNP density on GC content[12,13]. We present a large-scale study on genetic variability in the rice genome, based on high quality data from the SNP-seek database, with over 40 million SNPs from 3,000 rice genomes[25], the largest source of plant SNPs to date. Accessing this unique database allowed us to precisely describe genome-wide patterns of variability and to detect pronounced patterns not seen before. We provide new evidence on close association between local genetic variability and functionality, and suggest that the patterns of SNP density can help to detect functionally important genomic regions that are essential for crop improvement. We calculated relative SNP density on different groups of genes and demonstrated that transcription factor-encoding genes are highly conserved, whereas kinases and membrane-bound transporters are the most variable gene families

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call