Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures.

Tiehang Duan,Xiaohui Xie,José P Pinto

doi:10.1093/bioinformatics/bty702

Abstract

With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (i) the clustering quality still needs to be improved; (ii) most models need prior knowledge on number of clusters, which is not always available; (iii) there is a demand for faster computational speed. We propose to tackle these challenges with Parallelized Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive inference on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed. Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_package. Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: Aug 28, 2018
Citations: 39

Similar Papers

Malware Detection Using Nonparametric Bayesian Clustering and Classification Techniques
Yimin Kao ... Blake Anderson
Technometrics | VOL. 57
Yimin Kao, et. al.Yimin Kao ... Blake Anderson
02 Oct 2015
Technometrics | VOL. 57

On selecting the hyperparameters of the DPM models for the density estimation of observation errors
Asma Rabaoui ... Nicolas Viandier
-
Asma Rabaoui, et. al.Asma Rabaoui ... Nicolas Viandier
01 May 2011
01 May 2011

A Dirichlet process mixture model for automatic18F-FDG PET image segmentation: Validation study on phantoms and on lung and esophageal lesions
Maria Grazia Giri ... Carlo Cavedon
Medical Physics | VOL. 43
Maria Grazia Giri, et. al.Maria Grazia Giri ... Carlo Cavedon
26 Apr 2016
Medical Physics | VOL. 43

A Bayesian Small Area Model with Dirichlet Processes on the Responses
Jiani Yin ... Balgobin Nandram
Statistics in Transition New Series | VOL. 21
Jiani Yin, et. al.Jiani Yin ... Balgobin Nandram
01 Sep 2020
Statistics in Transition New Series | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics