Abstract

BackgroundWith the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene/transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. We therefore aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, we aimed to make the workflow usable also for users with limited programming skills.ResultsUtilizing the workflow management system Snakemake and the package management system Conda, we have developed a modular, flexible and user-friendly RNA-Seq analysis workflow: RNA-Seq Analysis Snakemake Workflow (RASflow). Utilizing Snakemake and Conda alleviates challenges with library dependencies and version conflicts and also supports reproducibility. To be applicable for a wide variety of applications, RASflow supports the mapping of reads to both genomic and transcriptomic assemblies. RASflow has a broad range of potential users: it can be applied by researchers interested in any organism and since it requires no programming skills, it can be used by researchers with different backgrounds. The source code of RASflow is available on GitHub: https://github.com/zhxiaokang/RASflow.ConclusionsRASflow is a simple and reliable RNA-Seq analysis workflow covering many use cases.

Highlights

  • With the cost of DNA sequencing decreasing, increasing amounts of RNA sequencing (RNA-Seq) data are being generated giving novel insight into gene expression and regulation

  • We present RNA-Seq Analysis Snakemake Workflow (RASflow) that is usable for a wide range of applications

  • RASflow was tested on four real datasets: pairend RNA-seq of prostate cancer and adjacent normal tissues from 14 patients (ArrayExpress accession: E-MTAB-567) [38], single-end RNA-Seq of mesenchymal stem cells (MSCs) and cancer-associated fibroblasts (CAFs) from EG7 tumor-bearing mice (GEO accession: GSE141199), pair-end RNA-Seq of Atlantic cod liver slices exposed to benzo[a]pyrene (BaP) and 17αethynylestradiol (EE2) (GEO accession: GSE106968) [39], and a benchmarking dataset, single-end RNA-Seq of highly purified human classical and nonclassical monocyte subsets from a clinical cohort (SRA accession: SRP082682) [40]

Read more

Summary

Introduction

With the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene/transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. We aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. We aimed to make the workflow usable for users with limited programming skills

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call