Abstract

Next-generation sequencing (NGS) has revolutionized plant and animal research in many ways including new methods of high throughput genotyping. Genotyping-by-sequencing (GBS) has been demonstrated to be a robust and cost-effective genotyping method capable of producing thousands to millions of SNPs across a wide range of species. Undoubtedly, the greatest barrier to its broader use is the challenge of data analysis. Herein we describe a comprehensive comparison of seven GBS bioinformatics pipelines developed to process raw GBS sequence data into SNP genotypes. We compared five pipelines requiring a reference genome (TASSEL-GBS v1& v2, Stacks, IGST, and Fast-GBS) and two de novo pipelines that do not require a reference genome (UNEAK and Stacks). Using Illumina sequence data from a set of 24 re-sequenced soybean lines, we performed SNP calling with these pipelines and compared the GBS SNP calls with the re-sequencing data to assess their accuracy. The number of SNPs called without a reference genome was lower (13k to 24k) than with a reference genome (25k to 54k SNPs) while accuracy was high (92.3 to 98.7%) for all but one pipeline (TASSEL-GBSv1, 76.1%). Among pipelines offering a high accuracy (>95%), Fast-GBS called the greatest number of polymorphisms (close to 35,000 SNPs + Indels) and yielded the highest accuracy (98.7%). Using Ion Torrent sequence data for the same 24 lines, we compared the performance of Fast-GBS with that of TASSEL-GBSv2. It again called more polymorphisms (25.8K vs 22.9K) and these proved more accurate (95.2 vs 91.1%). Typically, SNP catalogues called from the same sequencing data using different pipelines resulted in highly overlapping SNP catalogues (79–92% overlap). In contrast, overlap between SNP catalogues obtained using the same pipeline but different sequencing technologies was less extensive (~50–70%).

Highlights

  • Next-generation sequencing (NGS) has facilitated greatly the development of methods to genotype very large numbers of molecular markers such as single nucleotide polymorphismsPLOS ONE | DOI:10.1371/journal.pone.0161333 August 22, 2016Comparison of GBS Analysis Pipelines (SNPs)

  • A pipeline for GBS must include steps to filter out poor-quality reads, classify reads by pool or individuals based on sequence barcodes, either identify loci and alleles de novo or align reads to an index reference genome to discover polymorphisms, and often score genotypes for each individual included in the study

  • The number of SNPs called by UNEAK was not too far below the mean number of SNPs called by reference-based pipelines (32,423)

Read more

Summary

Introduction

Next-generation sequencing (NGS) has facilitated greatly the development of methods to genotype very large numbers of molecular markers such as single nucleotide polymorphismsPLOS ONE | DOI:10.1371/journal.pone.0161333 August 22, 2016Comparison of GBS Analysis Pipelines (SNPs). NGS offers several approaches that are capable of simultaneously performing genomewide SNP discovery and genotyping in a single step, even in species for which little or no genetic information is available [1] This revolution in genetic marker discovery enables the study of important questions in molecular breeding, population genetics, ecological genetics and evolution. The most highly used methods of genotyping relying on NGS use restriction enzymes to capture a reduced representation of a genome [2,3,4,5,6,7,8,9] New approaches such as restriction site-associated DNA sequencing (RAD-seq) and genotyping-by-sequencing (GBS) have been developed as rapid and robust approaches for reduced-representation sequencing of multiplexed samples that combines genome-wide molecular marker discovery and genotyping [1]. The most highly used pipelines for such a de novo-based approach are UNEAK and Stacks [15, 16]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call