Abstract

High-throughput RNA sequencing (RNA-seq) has become an instrumental assay for the analysis of multiple aspects of an organism's transcriptome. Further, the analysis of a biological specimen's associated microbiome can also be performed using RNA-seq data and this application is gaining interest in the scientific community. There are many existing bioinformatics tools designed for analysis and visualization of transcriptome data. Despite the availability of an array of next generation sequencing (NGS) analysis tools, the analysis of RNA-seq data sets poses a challenge for many biomedical researchers who are not familiar with command-line tools. Here we present RNA CoMPASS, a comprehensive RNA-seq analysis pipeline for the simultaneous analysis of transcriptomes and metatranscriptomes from diverse biological specimens. RNA CoMPASS leverages existing tools and parallel computing technology to facilitate the analysis of even very large datasets. RNA CoMPASS has a web-based graphical user interface with intrinsic queuing to control a distributed computational pipeline. RNA CoMPASS was evaluated by analyzing RNA-seq data sets from 45 B-cell samples. Twenty-two of these samples were derived from lymphoblastoid cell lines (LCLs) generated by the infection of naïve B-cells with the Epstein Barr virus (EBV), while another 23 samples were derived from Burkitt's lymphomas (BL), some of which arose in part through infection with EBV. Appropriately, RNA CoMPASS identified EBV in all LCLs and in a fraction of the BLs. Cluster analysis of the human transcriptome component of the RNA CoMPASS output clearly separated the BLs (which have a germinal center-like phenotype) from the LCLs (which have a blast-like phenotype) with evidence of activated MYC signaling and lower interferon and NF-kB signaling in the BLs. Together, this analysis illustrates the utility of RNA CoMPASS in the simultaneous analysis of transcriptome and metatranscriptome data. RNA CoMPASS is freely available at http://rnacompass.sourceforge.net/.

Highlights

  • Through its capacity to delve deeply into the genetic composition of a biological specimen, generation sequencing (NGS) technology presents an unprecedented approach to pathogen discovery in the context of human disease

  • In RNA CoMPASS, we have implemented both the Java Parallel Processing Framework (JPPF) API and Portable Batch System (PBS) API in order to deploy it on either a small local cluster or a grid system managed by PBS submission

  • Our results demonstrate the utility of RNA CoMPASS in analyzing large sequence datasets for the discovery of pathogens and host transcriptome analysis

Read more

Summary

Introduction

Through its capacity to delve deeply into the genetic composition of a biological specimen, generation sequencing (NGS) technology presents an unprecedented approach to pathogen discovery in the context of human disease. The discovery of an association between Fusobacterium and colorectal carcinoma was made using two different NGS approaches [2,3] These discoveries were facilitated by the use of computational subtraction approaches where reads aligning to reference genomes were subtracted from the sequence file leaving behind sequences from undiscovered organisms. While current sequence-based computational subtraction pipelines are used solely for pathogen discovery, RNA CoMPASS, takes advantage of the richness of RNA-seq data to provide host transcript expression data in addition to pathogen analysis. This concept, recently coined ‘‘dual RNA-seq’’ by Westermann and colleagues [9] allows the user to simultaneously investigate cellular signaling pathways. We present RNA CoMPASS and demonstrate its utility in dual analysis of RNA-seq data sets from different B-cell types with different EBV infection status

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call