GRACE: A Graph-Based Cluster Ensemble Approach for Single-Cell RNA-Seq Data Clustering

Jihong Guan,Jiasheng Wang,Rui-Yi Li

doi:10.1109/access.2020.3022718

Jihong Guan, Jiasheng Wang + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3022718

Copy DOI

Abstract

Rapid development of single cell RNA sequencing (scRNA-seq) technology has accelerated the exploration in biomedical researches. One of the focal interests in scRNA-seq data analysis is to classify cells into different types, which significantly assists in studying inter-cellular heterogeneity, such as cell types, cell states, and cell lineages, at the resolution of single cells. Although a number of tailored approaches have been developed for scRNA-seq data, their performance varies with different datasets and their clustering accuracy need to be improved. In this paper, we propose a novel ensemble clustering framework for scRNA-seq data called GRACE (GRAph-based Cluster Ensemble approach). First, we construct a highly reliable graph network for single cells by combining the clustering outcomes from five leading scRNA-seq data clustering methods. Then, we remeasure the relationships between cells by exploring the topology structure of network using random walk distance. Finally, we build a hierarchical cell-tree and obtain the clustering labels by cutting the tree structure into an appropriate number of sub-trees. Experimental results on twelve benchmark datasets show that GRACE has the higher clustering accuracy and is more robust among a variety of datasets than the state-of-the-art individual approaches. In addition, the graph structure of the network which is built upon the ensemble clusters is more reliable than the networks which are constructed according to the conventional similarity metrics.

Highlights

As the basic structural and functional unit of organisms, single cells store the important genetic information [1]
We propose a novel cluster ensemble approach called GRACE, which is a graph theory based clustering method
The reason lays in the Algorithm 1 Framework of GRACE Input: The expression profile of scRNA-seq data, D; The number of samples(cells), N ; Output: The clustering labels of samples, L; 1: Cluster the scRNA-seq data D using the five clustering methods; 2: Calculate the consensus matrix from the clustering outcomes; 3: Construct the graph G of cells based on the consensus matrix; 4: Calculate the random walk distance of cells with equation (5); 5: Initialize the partition P0 = {v1, v2, . . . , vN }; 6: while k < N do 7: Calculate dCiCj, ∀ Ci, Cj ∈ Pk with equation (6)

Summary

Introduction

As the basic structural and functional unit of organisms, single cells store the important genetic information [1]. The technology of large population sequencing often analyzes tens of thousands cells altogether, where the expression value of gene is the average score of all the cells. It usually highlights the cell types with large populations and belies the rare cell types such as stem cells and cancer cells [5], [6]. The single cell RNA sequencing (scRNA-seq) technology can overcome this issue and promote the study of cellular heterogeneity [6], [7]. Clustering analysis, which can group cells according to gene expression patterns, is essential in order to mining

Methods

Results

Conclusion