A sparse negative binomial mixture model for clustering RNA-seq count data.

Yujia Li,Tanbin Rahman,Tianzhou Ma,Lu Tang,George C Tseng

doi:10.1093/biostatistics/kxab025

Abstract

Clustering with variable selection is a challenging yet critical task for modern small-n-large-p data. Existing methods based on sparse Gaussian mixture models or sparse $K$-means provide solutions to continuous data. With the prevalence of RNA-seq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with a Gaussian assumption. In this article, we develop a negative binomial mixture model with lasso or fused lasso gene regularization to cluster samples (small $n$) with high-dimensional gene features (large $p$). A modified EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with existing methods using extensive simulations and two real transcriptomic applications in rat brain and breast cancer studies. The result shows the superior performance of the proposed count data model in clustering accuracy, feature selection, and biological interpretation in pathways.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A sparse negative binomial mixture model for clustering RNA-seq count data.

Abstract

Talk to us

Similar Papers

More From: Biostatistics (Oxford, England)

Lead the way for us

Journal: Biostatistics (Oxford, England)	Publication Date: Aug 7, 2021
Citations: 7

Similar Papers

Predicting tree recruitment with negative binomial mixture models
Xiongqing Zhang ... Fengqiang Liu
Forest Ecology and Management | VOL. 270
Xiongqing Zhang, et. al.Xiongqing Zhang ... Fengqiang Liu
18 Feb 2012
Forest Ecology and Management | VOL. 270

Models for Overdispersion Count Data with Generalized Distribution: An Application to Parasites Intensity
Öznur İşçi̇ Güneri̇ ... Burcu Durmuş
Journal of New Theory | VOL. -
Öznur İşçi̇ Güneri̇, et. al.Öznur İşçi̇ Güneri̇ ... Burcu Durmuş
30 Jun 2021
Journal of New Theory | VOL. -

Comparison of different statistical models for the analysis of fracture events: findings from the Prevention of Falls Injury Trial (PreFIT)
Anower Hossain ... Sarah E Lamb
BMC Medical Research Methodology | VOL. 23
Anower Hossain, et. al.Anower Hossain ... Sarah E Lamb
02 Oct 2023
BMC Medical Research Methodology | VOL. 23

Estimating Nested Count Data Models
Atanu Saha ... Diansheng Dong
Oxford Bulletin of Economics and Statistics | VOL. 59
Atanu Saha, et. al.Atanu Saha ... Diansheng Dong
01 Aug 1997
Oxford Bulletin of Economics and Statistics | VOL. 59

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A sparse negative binomial mixture model for clustering RNA-seq count data.

Abstract

Talk to us

Similar Papers

More From: Biostatistics (Oxford, England)