Abstract

Advances in technology have made it convenient to obtain a large amount of single cell RNA sequencing (scRNA-seq) data. Since that clustering is a very important step in identifying or defining cellular phenotypes, many clustering approaches have been developed recently for these applications. The general methods can be roughly divided into normal clustering methods and integrated (ensemble) clustering methods which combine more than two normal clustering methods aiming to get much more informative performance. In order to make a contrast with the integrated clustering algorithm, the normal clustering method is often called individual or base clustering method. Note that the results of many individual clustering methods are often developed to capture one aspect of the data, and the results depend on the initial parameter settings, such as cluster number, distance metric and so on. Compared with individual clustering, although integrative clustering method may get much more accurate performance, the results depend on the base clustering results and integrated systems are often not self-regulation. Therefore, how to design a robust unsupervised clustering method is still a challenge. In order to tackle above limitations, we propose a novel Ensemble Clustering algorithm based on Probability Graphical Model with Graph Regularization, which is called EC-PGMGR for short. On one hand, we use parameter controlling in Probability Graphical Model (PGM) to automatically determine the cluster number without prior knowledge. On the other hand, we add a regularization term to reduce the effect deriving from some weak base clustering results. Particularly, the integrative results collected from base clustering methods can be assembled in the form of combination with self-regulation weights through a pre-learning process, which can efficiently enhance the effect of active clustering methods while weaken the effect of inactive clustering methods. Experiments are carried out on 7 data sets generated by different platforms with the number of single cells from 822 to 5,132. Results show that EC-PGMGR performs better than 4 alternative individual clustering methods and 2 ensemble methods in terms of accuracy including Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), robustness, effectiveness and so on. EC-PGMGR provides an effective way to integrate different clustering results for more accurate and reliable results in further biological analysis as well. It may provide some new insights to the other applications of clustering.

Highlights

  • Cells are considered as the most basic functional units of an organism (Rosvall and Bergstrom, 2017)

  • In order to validate the performance of EC-PGMGR, the comparative experiments are employed on seven benchmark data sets through SAFE R package (Yang et al, 2018)

  • We propose EC-PGMGR algorithm, an unsupervised ensemble clustering method using Probability Graphic Model (PGM) with graph Laplacian regularization

Read more

Summary

Introduction

Cells are considered as the most basic functional units of an organism (Rosvall and Bergstrom, 2017). The identification of cell types has a great impact on the discovery of novel cells and the study of cell function (Trapnell et al, 2014a; Reid et al, 2018). The expression levels of most genes vary widely among different cell types in different cells. Bulk-RNA sequencing (RNAseq) is a technology that averages the expression levels across many cells from different cell types, which may conceal some meaningful expression information (Trapnell et al, 2014b; Yang et al, 2018). Single-cell RNA sequencing (scRNAseq) characterizes the heterogeneity of cells, and is able to identify novel cell types, predict cell fate, and classify tumor subpopulation from a finer resolution perspective (Jia et al, 2017; Treutlein et al, 2014). With the clustering results in hand, it is convenient and meaningful to analyze different expressions in down stream

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call