Abstract

BackgroundPrevious studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown.ResultsBy using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity.ConclusionWe provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2745-8) contains supplementary material, which is available to authorized users.

Highlights

  • Previous studies compared running cost, time and other performance measures of popular sequencing platforms

  • We have reported a comprehensive assessment of the application of Proton sequencing platform on RNA sequencing (RNA-Seq), including software and analysis strategies in alignment, gene detection, gene expression quantifications and junction discovery, as well as the bias introduced by different library construction methods

  • Our study suggests that the decline of mapping rates of long reads against reference transcriptome by BWA and Bowtie2 was mainly due to the accumulation of sequencing error (Fig. 2c), whilst both sequencing errors and difficulty in junction alignment accounted for the poor mapping performance to genome (Fig. 2d and e)

Read more

Summary

Introduction

Previous studies compared running cost, time and other performance measures of popular sequencing platforms. Since the rapid development of sequencing technology in the last decade, several sequencing platforms such as Roche 454, Illumina HiSeq, Life Technologies SOLiD, Personal Genome Machine (PGM) and Proton and Pacific. Previous studies conducted by the Association of Biomolecular Resource Facilities (ABRF) and the Sequencing Quality Control Consortium (SEQC) reported high intraand inter-platform concordance in RNA-Seq among HiSeq, PGM and Proton, SOLiD, 454 and PacBio RS [11, 12]. Many popular RNA-Seq analysis tools were developed based on HiSeq data featured with high accuracy and equal read length, whilst sequencing data generated. The ABRF study [11] showed that the performance of GMAP [18] could achieve at about 90 % mapping rate but for STAR [19] only 60 % for PacBIO sequencing reads. We reported that care has to be taken for the detection of minor variants from sequencing errors on sequencing data generated by PGM [20]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call