Abstract

BackgroundThe study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.).Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results.MethodsIn order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq).ResultsThrough a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.

Highlights

  • The study of RNA has been dramatically improved by the introduction of Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, providing information on strand orientation (RNA-Seq)

  • In this paper we present the implementation of a modular analysis workflow, named RNA-Seq analysis pipeline (RAP) (RNA-Seq Analysis Pipeline), designed to analyze sequencing data in multiple steps, each one addressing a specific task

  • The web-based graphical user interface (GUI) is written in PHP: Hypertext Preprocessor (PHP) language using HyperText Markup Language (HTML) and JQuery, combined with HTML5 and CSS3 standards, to enable a better user interaction

Read more

Summary

Introduction

The study of RNA has been dramatically improved by the introduction of Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, providing information on strand orientation (RNA-Seq). It can be profitably used to investigate the gene expression process, estimating both the nature and the quantity of expressed mRNAs [2] by sequencing a complete transcriptome in RNA-Seq can identify and quantify expressed genes and transcripts providing precious biological information on the underlying gene expression mechanisms. Gene expression is a highly regulated process and in some cases final products cannot be fully characterized by analyzing short reads generated by NGS platforms when many alternative transcripts of remarkable length are generated due to complex co-transcriptional and post-transcriptional nuclear processing, including alternative initiation and termination of transcription and alternative splicing [3]. RNA-Seq data can be analyzed by adopting several computational strategies depending on the requested results (e.g. expression at gene and/or transcript level, investigation of alternative splicing events, alternative polyadenylation sites, etc.). In order to provide easy and effective access to the gene expression studies to researchers with few or limited bioinformatics skills, user-friendly automated workflows are highly demanded to provide reliable and easy interpretable results [5] which keep up with the exponential growth of sequencing technologies [6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call