Abstract

BackgroundRNA sequencing has become an increasingly affordable way to profile gene expression patterns. Here we introduce a workflow implementing several open-source softwares that can be run on a high performance computing environment.ResultsDeveloped as a tool by the Bioinformatics Shared Resource Group (BISR) at the Ohio State University, we have applied the pipeline to a few publicly available RNAseq datasets downloaded from GEO in order to demonstrate the feasibility of this workflow. Source code is available here: workflow: https://code.bmi.osumc.edu/gadepalli.3/BISR-RNAseq-ICIBM2019 and shiny: https://code.bmi.osumc.edu/gadepalli.3/BISR_RNASeq_ICIBM19. Example dataset is demonstrated here: https://dataportal.bmi.osumc.edu/RNA_Seq/.ConclusionThe workflow allows for the analysis (alignment, QC, gene-wise counts generation) of raw RNAseq data and seamless integration of quality analysis and differential expression results into a configurable R shiny web application.

Highlights

  • Ribonucleic acid (RNA) sequencing has become an increasingly affordable way to profile gene expression patterns

  • As the Bioinformatics Shared Resource (BISR) group at Ohio State University (OSU), we developed this workflow to provide consistent analysis and reports to our collaborators

  • If a sample is run on multiple lanes, we recommend leaving them separate so that Quality control (QC) can be assessed on individual lanes

Read more

Summary

Introduction

RNA sequencing has become an increasingly affordable way to profile gene expression patterns. We introduce a workflow implementing several open-source softwares that can be run on a high performance computing environment. A whole transcriptome sequence provides an estimate of the quantity of all transcripts present in a group of cells. High throughput sequencing technologies have been developed to deep sequence the transcriptome. Sequencing generates several million short reads that are typically 50–400 bases in length. These reads can be mapped to a known reference genome or assembled de-novo. Either method will provide a snapshot of the transcript present in the sample and an estimate of abundance. Statistical methods have been developed to normalize and compare transcript estimates to identify differential transcripts. At each step of the bioinformatics analysis pipeline, there are many options for specific programs to use, reference

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call