Abstract

Copy number variations (CNVs) greatly contribute to intraspecies genetic polymorphism and phenotypic diversity. Recent analyses of sequencing data for >1000 Arabidopsis (Arabidopsis thaliana) accessions focused on small variations and did not include CNVs. Here, we performed genome-wide analysis and identified large indels (50 to 499 bp) and CNVs (500 bp and larger) in these accessions. The CNVs fully overlap with 18.3% of protein-coding genes, with enrichment for evolutionarily young genes and genes involved in stress and defense. By combining analysis of both genes and transposable elements (TEs) affected by CNVs, we revealed that the variation statuses of genes and TEs are tightly linked and jointly contribute to the unequal distribution of these elements in the genome. We also determined the gene copy numbers in a set of 1060 accessions and experimentally validated the accuracy of our predictions by multiplex ligation-dependent probe amplification assays. We then successfully used the CNVs as markers to analyze population structure and migration patterns. Finally, we examined the impact of gene dosage variation triggered by a CNV spanning the SEC10 gene on SEC10 expression at both the transcript and protein levels. The catalog of CNVs, CNV-overlapping genes, and their genotypes in a top model dicot will stimulate the exploration of the genetic basis of phenotypic variation.

Highlights

  • 14" The number of plant species for which Copy number variations (CNVs) regions have been identified at the genome-wide 15" scale has grown rapidly within the last decade (Muñoz-Amatriaín et al, 2013; Duitama et al, 16" 2015; Fuentes et al, 2019; Chia et al, 2012; Hardigan et al, 2016; Swanson-Wagner et al, 17" 2010)

  • A larger study focused on comparing the genomes of 17 28" accessions that were sequenced and assembled de novo from whole-genome sequencing (WGS) data revealed multiple 29" polymorphic regions that could not be mapped to the reference genome (Gan et al, 2011). 30" Based on the same WGS data, Bush et al (2014) identified numerous exon-overlapping 31" regions in the A. thaliana genome that were absent from at least one accession

  • 262" After we identified the genomic regions showing copy number polymorphism in A. thaliana, 263" we used the Genome STRiP SVGenotyper module (Handsaker et al, 2015) to evaluate the 264" copy number statuses of CNV-genes in individual accessions based on read depth estimates. 265" Based on our earlier observations, we decided to directly evaluate the copy numbers of the 266" genes covered by AthCNVs instead of the AthCNVs 267" themselves

Read more

Summary

1" INTRODUCTION

2" The frequent occurrence of duplications and deletions in eukaryotic genomes is among the 3" most crucial factors that affect adaptation, evolution, and speciation (Kondrashov, 2012; 4" Panchy et al, 2016). One of the earliest studies of this type combined the results 23" of array-based hybridization and short read-based whole-genome sequencing (WGS) to 24" identify ≥100 bp deletions in the genomes of four A. thaliana accessions, Eil-0, Lc-0, Sav-0, 25" and Tsu-1 (Santuari et al, 2010). These deletions overlapped with 987 to 1,344 protein26" coding genes (for simplicity, we refer to them as genes hereafter), and many of them were 27" shared by at least 2 accessions. The CNV map and copy number genotyping data generated in this study 55" provide a background for further studies on the genetic bases of phenotypic variation in A. 56" thaliana

57" RESULTS
550" METHODS
810" Experimental procedures
854" Supplemental Data
920" ACKNOWLEDGMENTS
945" REFERENCES
Findings
71 Low quality
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call