Abstract
Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.
Highlights
In research areas such as RNA-sequencing (RNA-Seq) and microbiomics, sequencing technologies are applied to measure the composition of mixtures of nucleic acids [1, 2]
In this paper we propose a new statistical goodness of fit (GoF) test for the negative binomial (NB) distribution in regression models that are commonly used for analysing RNA-Seq and microbiome studies
Sequencing count data are often assumed to follow the NB or zero-inflated negative binomial (ZINB) distributions, which form the basis of several statistical procedures for testing for differential expression (RNASeq) or differential abundance
Summary
OPEN ACCESS Citation: Hawinkel S, Rayner JCW, Bijnens L, Thas O (2020) Sequence count data are poorly fit by the negative binomial distribution. Editor: Shailesh Kumar, National Institute of Plant Genome Research (NIPGR), INDIA Received: October 22, 2019
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.