Abstract

We compared the performance of two commonly used genotyping platforms, genotyping-by-sequencing (GBS) and single nucleotide polymorphism-arrays (SNP), to investigate the extent and pattern of genetic variation within a collection of 1,000 diverse barley genotypes selected from the German Federal ex situ GenBank hosted at IPK Gatersleben. Each platform revealed equivalent numbers of robust bi-allelic SNPs (39,733 and 37,930 SNPs for the 50K SNP-array and GBS datasets respectively). A small overlap of 464 SNPs was common to both platforms, indicating that the methodologies we used selectively access informative polymorphism in different portions of the barley genome. Approximately half of the GBS dataset was comprised of SNPs with minor allele frequencies (MAFs) below 1%, illustrating the power of GBS to detect rare alleles in diverse germplasm collections. While desired for certain applications, the highly robust calling of alleles at the same SNPs across multiple populations is an advantage of the SNP-array, allowing direct comparisons of data from related or unrelated studies. Overall MAFs and diversity statistics (π) were higher for the SNP-array data, potentially reflecting the conscious removal of markers with a low MAF in the ascertainment population. A comparison of similarity matrices revealed a positive correlation between both approaches, supporting the validity of using either for entire GenBank characterization. To explore the potential of each dataset for focused genetic analyses we explored the outcomes of their use in genome-wide association scans for row type, growth habit and non-adhering hull, and discriminant analysis of principal components for the drivers of sub-population differentiation. Interpretation of the results from both types of analysis yielded broadly similar conclusions indicating that choice of platform used for such analyses should be determined by the research question being asked, group preferences and their capabilities to extract and interpret the different types of output data easily and quickly. Access to the requisite infrastructure for running, processing, analyzing, querying, storing, and displaying either datatype is an additional consideration. Our investigations reveal that for barley the cost per genotyping assay is less for SNP-arrays than GBS, which translates to a cost per informative datapoint being significantly lower for the SNP-array.

Highlights

  • The detection of genome-wide sequence-defined single nucleotide polymorphism-arrays (SNPs) is key to addressing a wide range of biological and ecological questions, from describing and partitioning overall levels of biological diversity to cloning genes conferring phenotypic traits, and for practical exploitation in animal and crop breeding

  • When GBS was becoming popular in the plant community, we evaluated the PstI-MseI GBS protocol in a study aimed at mapping the barley breviaristatum-e locus in a biparental recombinant inbred line (RIL) population (Liu et al, 2014)

  • We removed 2,567 SNP in the 50K SNP-array dataset and 603 SNP in the GBS dataset for all subsequent comparative analysis due to their not being assigned a physical position on the current barley genome pseudomolecules (Mascher et al, 2017)

Read more

Summary

Introduction

The detection of genome-wide sequence-defined single nucleotide polymorphism-arrays (SNPs) is key to addressing a wide range of biological and ecological questions, from describing and partitioning overall levels of biological diversity to cloning genes conferring phenotypic traits, and for practical exploitation in animal and crop breeding. Semi-open systems are typified by RAD-seq (Miller et al, 2007), DArTseq (Kilian et al, 2012) or genotyping-by-sequencing (GBS, Elshire et al, 2011) These are similar in nature and assay new variation in each different set of genetic material analyzed. We use the term semi-open because all typically involve a genome complexity reduction step, in GBS for example through use of methylation sensitive restriction enzymes to selectively avoid highly repetitive genomic regions ( introducing bias), and short read generation sequencing of the regions adjacent to the cleaved restriction sites Their increasing popularity stems from them being species and sequencing platform agnostic, as well as being considered fast, cheap and informative (Lu et al, 2015). GBS data includes a large fraction of missing data which requires imputation and sometimes complex computational interpretation prior to subsequent analysis

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call