Abstract

The choice of stochasticity distribution for modeling the noise distribution is a fundamental assumption for the analysis of sequencing data and consequently is critical for the accurate assessment of biological heterogeneity and differential expression. The stochasticity of RNA sequencing has been assumed to follow Poisson distributions. We collected microRNA sequencing data and observed that its stochasticity is better approximated by gamma distributions, likely because of the stochastic nature of exponential PCR amplification. We validated our findings with two independent datasets, one for microRNA sequencing and another for RNA sequencing. Motivated by the gamma distributed stochasticity, we provided a simple method for the analysis of RNA sequencing data and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data.

Highlights

  • Next-generation sequencing is a stochastic, or “noisy”, process[1]

  • This empirical evidence was derived from technical replicates for the read generation step only, and not for the library preparation step

  • We investigated the intrinsic stochasticity for the sequencing of microRNAs on the basis of data from technical replicates encompassing both the library preparation step and the read generation step

Read more

Summary

Introduction

Next-generation sequencing is a stochastic, or “noisy”, process[1]. An intrinsic source of the noise is the inherent randomness of the biochemical processes for library preparation and read generation[2]. A Poisson distribution is assumed for modeling technical variations in popular tools for identifying differentially expressed genes (such as edgeR4 and DESeq5) and in statistical methods for clustering genes[6] or samples[7]. This assumption is primarily based on the argument that sequencing data represent discrete counts, and the supporting empirical evidence is very limited[8]. We investigated the intrinsic stochasticity for the sequencing of microRNAs (miRNAs; a class of small non-coding RNAs) on the basis of data from technical replicates encompassing both the library preparation step and the read generation step. Motivated by the gamma distributed stochasticity, we provided a simple and powerful method (based on cubic root transformation and normal-distribution based methods) for analyzing RNA sequencing data and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.