Abstract

BackgroundIntraspecies copy number variations (CNVs), defined as unbalanced structural variations of specific genomic loci, ≥1 kb in size, are present in the genomes of animals and plants. A growing number of examples indicate that CNVs may have functional significance and contribute to phenotypic diversity. In the model plant Arabidopsis thaliana at least several hundred protein-coding genes might display CNV; however, locus-specific genotyping studies in this plant have not been conducted.ResultsWe analyzed the natural CNVs in the region overlapping MSH2 gene that encodes the DNA mismatch repair protein, and AT3G18530 and AT3G18535 genes that encode poorly characterized proteins. By applying multiplex ligation-dependent probe amplification and droplet digital PCR we genotyped those genes in 189 A. thaliana accessions. We found that AT3G18530 and AT3G18535 were duplicated (2–14 times) in 20 and deleted in 101 accessions. MSH2 was duplicated in 12 accessions (up to 12-14 copies) but never deleted. In all but one case, the MSH2 duplications were associated with those of AT3G18530 and AT3G18535. Considering the structure of the CNVs, we distinguished 5 genotypes for this region, determined their frequency and geographical distribution. We defined the CNV breakpoints in 35 accessions with AT3G18530 and AT3G18535 deletions and tandem duplications and showed that they were reciprocal events, resulting from non-allelic homologous recombination between 99 %-identical sequences flanking these genes. The widespread geographical distribution of the deletions supported by the SNP and linkage disequilibrium analyses of the genomic sequence confirmed the recurrent nature of this CNV.ConclusionsWe characterized in detail for the first time the complex multiallelic CNV in Arabidopsis genome. The region encoding MSH2, AT3G18530 and AT3G18535 genes shows enormous variation of copy numbers among natural ecotypes, being a remarkable example of high Arabidopsis genome plasticity. We provided the molecular insight into the mechanism underlying the recurrent nature of AT3G18530-AT3G18535 duplications/deletions. We also performed the first direct comparison of the two leading experimental methods, suitable for assessing the DNA copy number status. Our comprehensive case study provides foundation information for further analyses of CNV evolution in Arabidopsis and other plants, and their possible use in plant breeding.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3221-1) contains supplementary material, which is available to authorized users.

Highlights

  • Intraspecies copy number variations (CNVs), defined as unbalanced structural variations of specific genomic loci, ≥1 kb in size, are present in the genomes of animals and plants

  • We characterized in detail for the first time the complex multiallelic CNV in Arabidopsis genome

  • MSH2, AT3G18530 and AT3G18535 genes undergo CNV in Arabidopsis population According to Cao et al [29], the AT3G18530 and AT3G18535 genes and a part of the MSH2 gene are covered by two distinct CNVs (CNV_611 and CNV_610) separated by a 1.5-kb distance (Fig. 1)

Read more

Summary

Introduction

Intraspecies copy number variations (CNVs), defined as unbalanced structural variations of specific genomic loci, ≥1 kb in size, are present in the genomes of animals and plants. In the model plant Arabidopsis thaliana at least several hundred protein-coding genes might display CNV; locus-specific genotyping studies in this plant have not been conducted. CNV regions often span protein coding genes [2,3,4,5,6,7]. Changes in the number of functional gene copies (or their distal regulatory regions) might affect the amount of expressed protein and alter the phenotype. Increased resistance to soybean cyst nematode (SCN) reported in some soybean (Glycine max) lines has been associated with the duplication of the genomic region Rhg, which spans 3 genes likely involved in counteracting the pathogen infection [13]. The rice (Oryza sativa) landrace Ping has a superior grain length and quality that reflects the tandem duplication of the GL7 gene, which encodes a protein homologous to Arabidopsis (Arabidopsis thaliana) LONGIFOLIA proteins [22]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call