UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study

Peng Qi,Stephan Schröder,Mathews M Dida,Davis Gimode,Debkanta Chakraborty,Russell L Malmberg,Katrien M Devos,Dipnarayan Saha,Xuewen Wang

doi:10.1186/s12870-018-1316-3

Abstract

BackgroundResearch on orphan crops is often hindered by a lack of genomic resources. With the advent of affordable sequencing technologies, genotyping an entire genome or, for large-genome species, a representative fraction of the genome has become feasible for any crop. Nevertheless, most genotyping-by-sequencing (GBS) methods are geared towards obtaining large numbers of markers at low sequence depth, which excludes their application in heterozygous individuals. Furthermore, bioinformatics pipelines often lack the flexibility to deal with paired-end reads or to be applied in polyploid species.ResultsUGbS-Flex combines publicly available software with in-house python and perl scripts to efficiently call SNPs from genotyping-by-sequencing reads irrespective of the species’ ploidy level, breeding system and availability of a reference genome. Noteworthy features of the UGbS-Flex pipeline are an ability to use paired-end reads as input, an effective approach to cluster reads across samples with enhanced outputs, and maximization of SNP calling. We demonstrate use of the pipeline for the identification of several thousand high-confidence SNPs with high representation across samples in an F3-derived F2 population in the allotetraploid finger millet. Robust high-density genetic maps were constructed using the time-tested mapping program MAPMAKER which we upgraded to run efficiently and in a semi-automated manner in a Windows Command Prompt Environment. We exploited comparative GBS with one of the diploid ancestors of finger millet to assign linkage groups to subgenomes and demonstrate the presence of chromosomal rearrangements.ConclusionsThe paper combines GBS protocol modifications, a novel flexible GBS analysis pipeline, UGbS-Flex, recommendations to maximize SNP identification, updated genetic mapping software, and the first high-density maps of finger millet. The modules used in the UGbS-Flex pipeline and for genetic mapping were applied to finger millet, an allotetraploid selfing species without a reference genome, as a case study. The UGbS-Flex modules, which can be run independently, are easily transferable to species with other breeding systems or ploidy levels.

Highlights

Research on orphan crops is often hindered by a lack of genomic resources
Efficiency of different enzyme combinations in generating polymorphic markers We tested two two-enzyme combinations (PstI/MspI and PstI/NdeI) and one three-enzyme combination (PstI/MspI + ApeKI) on three finger millet accessions for their efficiency in generating largely overlapping fragment pools that, when sequenced, yielded single nucleotide polymorphism (SNP) that were present in all three accessions at a depth of at least 8×
This suggests that high read depth hampered the performance of ‘cstacks’, possibly because a higher read depth led to a higher absolute presence of SNPs caused by PCR or sequencing errors in allelic reads

Summary

Introduction

Research on orphan crops is often hindered by a lack of genomic resources. With the advent of affordable sequencing technologies, genotyping an entire genome or, for large-genome species, a representative fraction of the genome has become feasible for any crop. Most genotyping-by-sequencing (GBS) methods are geared towards obtaining large numbers of markers at low sequence depth, which excludes their application in heterozygous individuals. Despite the recent advances in sequencing technologies, whole genome sequencing is still not cost effective for large-genome species, especially when multiple-fold coverage needs to be achieved of several hundred individuals. As a result, genotyping-by-sequencing data sets often have large amounts of missing data and low sequence coverage at each locus [2, 5, 6]. We implemented several modifications to the experimental GBS protocol developed by Elshire et al [2] and Poland et al [8], and tested their effect on reducing the GBS fragment pool and providing more even read coverage across pooled samples for high-confidence imputation free single nucleotide polymorphism (SNP) identification

Objectives

Methods

Results

Discussion

Conclusion