Abstract

BackgroundSingle nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of next generation sequencing (NGS) provides a high-throughput means of SNP discovery. However, SNP development is limited by the availability of reliable SNP discovery methods. Especially, the optimum assembler and SNP caller for accurate SNP prediction from next generation sequencing data are not known.ResultsHerein we performed SNP prediction based on RNA-seq data of peach and mandarin peel tissue under a comprehensive comparison of two paired-end read lengths (125 bp and 150 bp), five assemblers (Trinity, IDBA, oases, SOAPdenovo, Trans-abyss) and two SNP callers (GATK and GBS). The predicted SNPs were compared with the authentic SNPs identified via PCR amplification followed by gene cloning and sequencing procedures. A total of 40 and 240 authentic SNPs were presented in five anthocyanin biosynthesis related genes in peach and in nine carotenogenic genes in mandarin. Putative SNPs predicted from the same RNA-seq data with different strategies led to quite divergent results. The rate of false positive SNPs was significantly lower when the paired-end read length was 150 bp compared with 125 bp. Trinity was superior to the other four assemblers and GATK was substantially superior to GBS due to a low rate of missing authentic SNPs. The combination of assembler Trinity, SNP caller GATK, and the paired-end read length 150 bp had the best performance in SNP discovery with 100% accuracy both in peach and in mandarin cases. This strategy was applied to the characterization of SNPs in peach and mandarin transcriptomes.ConclusionsThrough comparison of authentic SNPs obtained by PCR cloning strategy and putative SNPs predicted from different combinations of five assemblers, two SNP callers, and two paired-end read lengths, we provided a reliable and efficient strategy, Trinity-GATK with 150 bp paired-end read length, for SNP discovery from RNA-seq data. This strategy discovered SNP at 100% accuracy in peach and mandarin cases and might be applicable to a wide range of plants and other organisms.

Highlights

  • Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies

  • The effects of different paired-end read lengths, assemblers, and SNP callers on the accuracy of SNP results were investigated and it was found that SNPs can be accurately discovered by performing RNA-seq with a 150 bp read length, assembling with Trinity and SNP calling with Genome Analysis Toolkit (GATK)

  • Overview of transcriptome sequencing Transcriptome sequencing of peach cultivars ‘Hujingmilu’ (‘HJ’) and ‘Yulu’ (‘YL’), and mandarin cultivars ‘Ponkan’ (‘PK’) and ‘Yellowish-peeled Ponkan’ (‘YP’) was performed by Illumina HiSeqTM 2500 and 4000

Read more

Summary

Introduction

Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of generation sequencing (NGS) provides a high-throughput means of SNP discovery. Single nucleotide polymorphisms (SNPs) are single nucleotide base variations, caused by transitions (C/T or G/A) or transversions (C/G, C/A, or T/A, T/G), in the same position between individual genomic DNA sequences [1, 2]. SNP is the predominant type of DNA polymorphism for genetic variation, which is ubiquitously located in genomes [3, 4] in the intergenic region (regions between genes), coding sequences of genes (exons), or non-coding regions of genes (introns, 5’UTR, 3’UTR, or exon-intron splicing sites) [5]. Considerable effects on protein function and gene expression can be caused by SNPs occurring in coding regions and regulatory sequences, respectively.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call