Abstract

BackgroundThe growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions.ResultsWe propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE.ConclusionsWhen the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.

Highlights

  • The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools

  • We present a novel framework committing to facilitate the differential expression (DE) analysis of RNA-seq read count data generated by experiments with multiple conditions, which takes the correlation between samples into account

  • Methods we describe a rank-reduced model for the RNA-seq read counts of biological samples drawn from multiple conditions, develop an estimation/test procedure for DE analysis

Read more

Summary

Introduction

The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. One of the most important biological problems is to identify differentially expressed genes between multiple experimental conditions. The key of analyzing these data lies in establishing an appropriate statistical model for RNA-seq count data and preforming differential expression (DE) analysis. The read counts of various conditions can be correlated with each other if they are generated from the same subject When the number of conditions, D, is greater than two, an analysis of variance (ANOVA) model can be used to detect those genes differentially expressed between the conditions. The chisquared test based the ANOVA model has D − 1 degrees of freedom

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call