Development of an SNP Identification Pipeline for Highly Heterozygous Crops

T. Ruttink,A. Rohde,I. Roldán-Ruiz,L. Sterck,E. Vermeulen

doi:10.1007/978-94-007-4555-1_16

Abstract

Next Generation Sequencing technologies significantly advance the development of molecular markers for molecular breeding. Dedicated NGS data-analysis procedures must be developed for de novo reference assembly and SNP discovery in crop species without a reference genome sequence. In outcrossing fodder crops, the high degree of polymorphism hampers de novo assembly, contig clustering, read mapping, and SNP discovery. Using selected candidate genes as case studies, we illustrate the reconstruction of a reference transcript sequence from RNA-seq data from multiple genotypes, we validate de novo transcript assembly by Sanger sequencing, and analyse how read mapping and SNP discovery parameters determine sensitivity and specificity during SNP discovery. Thus, we propose a general strategy to construct a non-redundant reference transcriptome for crops without a sequenced genome, using predicted proteins from a closely related model species as a guidance for clustering and annotation. This reference transcriptome is required for candidate gene discovery and exome-wide identification of polymorphisms.

Full Text