Abstract

The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.

Highlights

  • RNA-sequencing (RNA-seq) has become the primary technology used for gene expression profiling, with the genome-wide detection of differentially expressed genes between two or more conditions of interest one of the most commonly asked questions by researchers

  • We describe an edgeR - limma workflow for analysing RNA-seq data that takes gene-level counts as its input, and moves through pre-processing and exploratory data analysis before obtaining lists of differentially expressed (DE) genes and gene signatures

  • Reads were aligned to the mouse reference genome using the R based pipeline available in the Rsubread package. Count data for these samples can be downloaded from the Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ using GEO Series accession number GSE63310

Read more

Summary

17 Jun 2016 report report report

This article is included in the Bioconductor gateway. The updated workflow makes use of current versions of software: R version 3.5.1 and Bioconductor project version 3.8. Output downstream of filtering has been updated, including adjustment to the vertical dotted line in Figure 1 marking the new log-CPM threshold. In “Transformations from the raw-scale” and “Removing genes that are lowly expressed”, text has been added to give more details on log-CPM values that are calculated and gene filtering strategy. Xueyi Dong and Luyi Tian are added as authors for translation of the article to Chinese which is available in the release version of the RNAseq123 workflow package from Bioconductor, http://bioconductor.org/packages/RNAseq123. Xueyi Dong updated the workflow to Bioconductor 3.8. These changes have been outlined in “Software availability” and “Author contributions”

Introduction
L008 1 L008
Filtered data
## $design
Bioconductor Core Team
16. Smyth GK
Findings
22. R Development Core Team
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.