Comprehensive description of genomewide nucleotide and structural variation in short-season soya bean.

Davoud Torkamaneh,François Belzile,Louise O'Donoughue,Istvan Rajcan,Elroy Cober,Jérôme Laroche,Aurélie Tardivel

doi:10.1111/pbi.12825

Davoud Torkamaneh, François Belzile + Show 5 more

Open Access

PDF Available

https://doi.org/10.1111/pbi.12825

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

SummaryNext‐generation sequencing (NGS) and bioinformatics tools have greatly facilitated the characterization of nucleotide variation; nonetheless, an exhaustive description of both SNP haplotype diversity and of structural variation remains elusive in most species. In this study, we sequenced a representative set of 102 short‐season soya beans and achieved an extensive coverage of both nucleotide diversity and structural variation (SV). We called close to 5M sequence variants (SNPs, MNPs and indels) and noticed that the number of unique haplotypes had plateaued within this set of germplasm (1.7M tag SNPs). This data set proved highly accurate (98.6%) based on a comparison of called genotypes at loci shared with a SNP array. We used this catalogue of SNPs as a reference panel to impute missing genotypes at untyped loci in data sets derived from lower density genotyping tools (150 K GBS‐derived SNPs/530 samples). After imputation, 96.4% of the missing genotypes imputed in this fashion proved to be accurate. Using a combination of three bioinformatics pipelines, we uncovered ~92 K SVs (deletions, insertions, inversions, duplications, CNVs and translocations) and estimated that over 90% of these were accurate. Finally, we noticed that the duplication of certain genomic regions explained much of the residual heterozygosity at SNP loci in otherwise highly inbred soya bean accessions. This is the first time that a comprehensive description of both SNP haplotype diversity and SV has been achieved within a regionally relevant subset of a major crop.

Highlights

Genetic variation describes the occurrence of DNA sequence differences among individuals of the same species (Hedrick, 2011)
We describe the whole-genome sequencing (WGS) of 102 short-season soya bean accessions [(G. max L.), a palaeopolyploid] to identify both nucleotide and structural variants using a combination of several bioinformatics tools
We selected 102 Canadian short-season elite soya bean accessions for whole-genome sequencing based on a prior genetic analysis containing a larger set of accessions (n = 441) that had been genotyped with ~80 K single nucleotide polymorphisms (SNPs) using a genotyping-bysequencing (GBS) approach (Figure S1)

Summary

Introduction

Genetic variation describes the occurrence of DNA sequence differences among individuals of the same species (Hedrick, 2011). Nucleotide variants are usually defined as encompassing single or multiple nucleotide variants (SNPs, MNPs) and small insertions/deletions (indels), whereas structural variants (SVs) represent larger rearrangements of various types [deletions, insertions, inversions, translocations, duplications and copy number variations (CNVs)] (Tuzun et al, 2005). The advent of next-generation sequencing (NGS) technologies has provided an exceptional opportunity to systematically detect both nucleotide and structural variants in plant and animal genomes (Church, 2006; El-Metwally et al, 2014; Hall, 2007). NGS has facilitated greatly the development of methods to genotype very large numbers of nucleotide variants such as single nucleotide polymorphisms (SNPs) (Goodwin et al, 2016). Decreased whole-genome sequencing (WGS) costs have made it possible to sequence entire genomes of numerous individuals, cultivars or accessions of the same species (Gudbjartsson et al, 2015; Zhang et al, 2001; Zhou et al, 2015)

Results

Discussion

Conclusion