Abstract

Single-cell DNA methylation sequencing technology has brought new perspectives to investigate epigenetic heterogeneity, supporting a need for computational methods to cluster cells based on single-cell methylation profiles. Although several methods have been developed, most of them cluster cells based on single (dis)similarity measures, failing to capture complete cell heterogeneity and resulting in locally optimal solutions. Here, we present scMelody, which utilizes an enhanced consensus-based clustering model to reconstruct cell-to-cell methylation similarity patterns and identifies cell subpopulations with the leveraged information from multiple basic similarity measures. Besides, benefitted from the reconstructed cell-to-cell similarity measure, scMelody could conveniently leverage the clustering validation criteria to determine the optimal number of clusters. Assessments on distinct real datasets showed that scMelody accurately recapitulated methylation subpopulations and outperformed existing methods in terms of both cluster partitions and the number of clusters. Moreover, when benchmarking the clustering stability of scMelody on a variety of synthetic datasets, it achieved significant clustering performance gains over existing methods and robustly maintained its clustering accuracy over a wide range of number of cells, number of clusters and CpG dropout proportions. Finally, the real case studies demonstrated the capability of scMelody to assess known cell types and uncover novel cell clusters.

Highlights

  • As a heritable covalent chemical modification, DNA methylation is closely correlated with cell growth, differentiation, and transformation, which plays decisive roles in diseases and tumorigenesis (Aran and Hellman, 2013; Oakes et al, 2016; Koch et al, 2018)

  • We found that scMelody spent more than 99% of the running time on calculating the basic cell-to-cell similarity matrices for the input single-cell methylation profiles (Supplementary Figure S4) and this was true for single-distance-based methods, such as PearsonHC and PDclust

  • We propose scMelody, an enhanced consensusbased clustering model for single-cell methylation data analysis by reconstructing cell-to-cell pairwise similarity

Read more

Summary

INTRODUCTION

As a heritable covalent chemical modification, DNA methylation is closely correlated with cell growth, differentiation, and transformation, which plays decisive roles in diseases and tumorigenesis (Aran and Hellman, 2013; Oakes et al, 2016; Koch et al, 2018). Recent advancements in ensemble clustering (Ghaemi et al, 2009; Vega-Pons and Ruiz-Shulcloper, 2011; Boongoen and IamOn, 2018) have demonstrated that integrating various basic cell partitions in a consensus matrix is effective to generate improved clustering solutions (Kiselev et al, 2017; Zhu et al, 2020; Cui et al, 2021; Wang et al, 2021). GSE56879 GSE65196 GSE65364 GSE83882 GSE87197 GSE97179 GSE97179 the most advanced performance over previous methods in clustering single-cell methylation data

Datasets and Pre-Processing
Determine the Optimal Number of Clusters
Model Comparison
Clustering Performance Metrics
RESULTS
Clustering Stability and Scalability of scMelody
DISCUSSION
Findings
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call