Abstract

Identifying the phenotypes and interactions of various cells is the primary objective in cellular heterogeneity dissection. A key step of this methodology is to perform unsupervised clustering, which, however, often suffers challenges of the high level of noise, as well as redundant information. To overcome the limitations, we proposed self-diffusion on local scaling affinity (LSSD) to enhance cell similarities’ metric learning for dissecting cellular heterogeneity. Local scaling infers the self-tuning of cell-to-cell distances that are used to construct cell affinity. Our approach implements the self-diffusion process by propagating the affinity matrices to further improve the cell similarities for the downstream clustering analysis. To demonstrate the effectiveness and usefulness, we applied LSSD on two simulated and four real scRNA-seq datasets. Comparing with other single-cell clustering methods, our approach demonstrates much better clustering performance, and cell types identified on colorectal tumors reveal strongly biological interpretability.

Highlights

  • The cells are the fundamental structural unit in biological systems

  • Too much diffusion may result in over-smoothed information for a given graph

  • We conducted a simulation experiment to investigate the selection of iteration steps, evaluating the clustering performance of local scaling self-diffusion (LSSD) in scRNA-seq clustering

Read more

Summary

Introduction

The cells are the fundamental structural unit in biological systems. For centuries, biologists have discovered that multicellular biological tissues are characterized by different cell types and can be distinguished according to their size and shape. Single-cell RNA sequencing (scRNA-seq) technologies have been developed as an attractive tool to reveal cell functional diversity and heterogeneity, bringing new insights into the biological systems (Pelkmans, 2012; Kaur et al, 2019). Several single-cell clustering approaches have been developed recently, for instance, SIMLR (Wang et al, 2017) learns a robust cell similarity metric that best fits the data structure via combining multiple kernels. SAME clustering is a mixture model–based approach which aggregates various clustering methods via the mixture model ensemble to produce an improved ensemble solution (Huh et al, 2020) The effectiveness of those singlecell clustering methods may decrease due to the low single-cell quality, biological differences, and the measurement dropouts. The self-diffusion process allows the derived distances to follow the intrinsic data manifolds

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call