Abstract

BackgroundIn current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis.ResultsWe developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data.ConclusionsOur results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS.

Highlights

  • In current statistical methods for calling differentially expressed genes in Ribonucleic acid (RNA)-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count

  • We have developed CORNAS (COverage-dependent RNA-Seq), a Bayesian method to infer the posterior distribution of a true gene count

  • Definition of true gene count and sample coverage We first define the true gene count as the total number of Messenger RNA (mRNA) copies of a gene, in a sample prepared for a sequencing run

Read more

Summary

Introduction

In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. In genetically identical yeast cells, variation of more than 800 copies of an mRNA species per cell has been observed [14]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call