Abstract
BackgroundRNA-Seq provides remarkable power in the area of biomarkers discovery and disease characterization. Two crucial steps that affect RNA-Seq experiment results are Library Sample Preparation (LSP) and Bioinformatics Analysis (BA). This work describes an evaluation of the combined effect of LSP methods and BA tools in the detection of splice variants.ResultsDifferent LSPs (TruSeq unstranded/stranded, ScriptSeq, NuGEN) allowed the detection of a large common set of splice variants. However, each LSP also detected a small set of unique transcripts that are characterized by a low coverage and/or FPKM. This effect was particularly evident using the low input RNA NuGEN v2 protocol.A benchmark dataset, in which synthetic reads as well as reads generated from standard (Illumina TruSeq 100) and low input (NuGEN) LSPs were spiked-in was used to evaluate the effect of LSP on the statistical detection of alternative splicing events (AltDE). Statistical detection of AltDE was done using as prototypes for splice variant-quantification Cuffdiff2 and RSEM-EBSeq. As prototype for exon-level analysis DEXSeq was used. Exon-level analysis performed slightly better than splice variant-quantification approaches, although at most only 50% of the spiked-in transcripts was detected. The performances of both splice variant-quantification and exon-level analysis improved when raising the number of input reads.ConclusionData, derived from NuGEN v2, were not the ideal input for AltDE, especially when the exon-level approach was used. We observed that both splice variant-quantification and exon-level analysis performances were strongly dependent on the number of input reads. Moreover, the ribosomal RNA depletion protocol was less sensitive in detecting splicing variants, possibly due to the significant percentage of the reads mapping to non-coding transcripts.
Highlights
RNA-Seq provides remarkable power in the area of biomarkers discovery and disease characterization
Bioinformatics Analysis (BA) pipelines for differential expression can be divided in two categories: i) differential expression based on splice variant quantification, and ii) exon-based differential expression
All above-mentioned Library Sample Preparation (LSP) were performed after PolyA+ selection, but for NuGEN v2 and TruSeq stranded LSP, which was used in association with the ribo-zero ribosomal RNA depletion
Summary
RNA-Seq provides remarkable power in the area of biomarkers discovery and disease characterization. The application of next-generation sequencing (NGS) to transcriptomics analysis, namely RNA-Seq, has allowed many advances in the characterization and quantification of transcripts. Several developments in RNA-Seq methods have provided an advance in the complete characterization of RNA molecules [1] These developments included improvements in transcription impact on downstream analysis and interpretation of RNA-Seq results [8], it is evident that robust and unbiased library preparation methods are critical. The choice of LSPs does not represent the only critical step in RNA-Seq. the sequencing data need to be converted into transcript information (transcript structure, transcript quantification, etc.), and this step requires an accurate selection of the bioinformatics and statistical analysis techniques to be used. We investigated the effect of different LSPs (NuGEN v2, TruSeq unstranded/stranded, ScriptSeq), as well as the effect of PolyA+ selection versus ribosomal depletion, on splice variant detection. We compared NuGEN low input protocol with standard TruSeq protocol using BA tools for splice variant-quantification (Cuffdiff [11] and RSEM-EBSeq [12,13]) and for exon-level quantification (DEXSeq [14])
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.