NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data

Liang He,Alexander M Kulminski,Tomokazu S Sumida,David A Hafler,Manolis Kellis,Jose Davila-Velderrain

doi:10.1038/s42003-021-02146-6

Abstract

The increasing availability of single-cell data revolutionizes the understanding of biological mechanisms at cellular resolution. For differential expression analysis in multi-subject single-cell data, negative binomial mixed models account for both subject-level and cell-level overdispersions, but are computationally demanding. Here, we propose an efficient NEgative Binomial mixed model Using a Large-sample Approximation (NEBULA). The speed gain is achieved by analytically solving high-dimensional integrals instead of using the Laplace approximation. We demonstrate that NEBULA is orders of magnitude faster than existing tools and controls false-positive errors in marker gene identification and co-expression analysis. Using NEBULA in Alzheimer’s disease cohort data sets, we found that the cell-level expression of APOE correlated with that of other genetic risk factors (including CLU, CST3, TREM2, C1q, and ITM2B) in a cell-type-specific pattern and an isoform-dependent manner in microglia. NEBULA opens up a new avenue for the broad application of mixed models to large-scale multi-subject single-cell data.

Highlights

The increasing availability of single-cell data revolutionizes the understanding of biological mechanisms at cellular resolution
NEBULA decomposes the total overdispersion into subject-level and cell-level components using a random-effects term parametrized by σ2 and the overdispersion parameter φ in the negative binomial distribution (Fig. 1a)
Our results show that combining methods based on the approximated marginal likelihood and the h-likelihood, NEBULA managed to achieve considerable speed gain and practically preserve estimation accuracy for analyzing scRNA-seq data

Summary

Introduction

The increasing availability of single-cell data revolutionizes the understanding of biological mechanisms at cellular resolution. The drastically increasing magnitude of sample size, poses a serious computational challenge when trying to apply conventional transcriptomics analysis for differential expression, expression quantitive trait loci (eQTLs), and co-expression to large-scale scRNA-seq data. This situation is in contrast to that of statistical models in bulk RNA-seq analysis, in which more emphasis is placed upon building a robust estimate under a small sample size (e.g., regularization of standard errors and robust estimation of overdispersion parameters[6,7,8]). To address the computational burden, we propose a NEgative Binomial mixed model Using Large-sample Approximation (NEBULA), a novel fast algorithm for association analysis of scRNA-seq data using an NBMM. We found that testing a subject-level variable was highly sensitive to the estimation of the subject-level variance component and the assumption of the distribution of the random effects when the number of subjects is small

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Communications Biology	Publication Date: May 26, 2021
Citations: 75	License type: open-access

R Discovery Prime

R Discovery Prime

NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Communications Biology

Lead the way for us

Similar Papers

Genetic analysis of fertility in dairy cattle using negative binomial mixed models.
Robert J Tempelman ... Daniel Gianola
Journal of dairy science | VOL. 82
Robert J Tempelman, et. al.Robert J Tempelman ... Daniel Gianola
01 Aug 1999
Journal of dairy science | VOL. 82

NBZIMM: negative binomial and zero-inflated mixed models, with application to microbiome/metagenomics data analysis
Xinyan Zhang ... Nengjun Yi
BMC Bioinformatics | VOL. 21
Xinyan Zhang, et. al.Xinyan Zhang ... Nengjun Yi
30 Oct 2020
BMC Bioinformatics | VOL. 21

Model based heritability scores for high-throughput sequencing data
Pratyaydipta Rudra ... Brian Vestal
BMC Bioinformatics | VOL. 18
Pratyaydipta Rudra, et. al.Pratyaydipta Rudra ... Brian Vestal
02 Mar 2017
BMC Bioinformatics | VOL. 18

Machine learning and statistical models for analyzing multilevel patent data
Sunyun Qi ... Yanchao Gao
Scientific Reports | VOL. 13
Sunyun Qi, et. al.Sunyun Qi ... Yanchao Gao
07 Aug 2023
Scientific Reports | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Communications Biology