Abstract

BackgroundAccurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score.ResultsSimulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis.ConclusionsThe proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1180-9) contains supplementary material, which is available to authorized users.

Highlights

  • Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network

  • Significance testing for individual gene Once the model (6) is fitted to the exon level read counts data, we identify nonparallel differentially expressed (NPDE) and parallel differentially expressed (PDE) genes by testing the significance of the interaction and main effects in (7)

  • For genes that are not considered as NPDE by the preceding test, we further investigate whether they are PDE or not

Read more

Summary

Introduction

Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. Most of the available methods treat expressions of a gene at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time, Sun et al BMC Bioinformatics (2016) 17:324 e.g., edgeR [5] and DESeq [6]. They fail to identify many DE genes with different profiles across time. When RNA-Seq experiments do not have replicates or the number of replicates is small, the statistical significance tests in edgeR and DESeq have small degrees of freedom and may result in a high false discovery rate (FDR)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call