Spatial transcriptomics can quantify gene expression and its spatial distribution in tissues, thus revealing molecular mechanisms of cellular interactions underlying tissue heterogeneity, tissue regeneration, and spatially localized disease mechanisms. However, existing spatial clustering methods often fail to exploit the full potential of spatial information, resulting in inaccurate identification of spatial domains. In this paper, we develop a deep graph contrastive clustering framework, stDGCC, that accurately uncovers underlying spatial domains via explicitly modeling spatial information and gene expression profiles from spatial transcriptomics data. The stDGCC framework proposes a spatially informed graph node embedding model to preserve the topological information of spots and to learn the informative and discriminative characterization of spatial transcriptomics data through self-supervised contrastive learning. By simultaneously optimizing the contrastive learning loss, reconstruction loss, and Kullback-Leibler (KL) divergence loss, stDGCC achieves joint optimization of feature learning and topology structure preservation in an end-to-end manner. We validate the effectiveness of stDGCC on various spatial transcriptomics datasets acquired from different platforms, each with varying spatial resolutions. Our extensive experiments demonstrate the superiority of stDGCC over various state-of-the-art clustering methods in accurately identifying cellular-level biological structures. Code and data are available from https://github.com/TimE9527/stDGCC and https://figshare.com/projects/stDGCC/186525. Supplementary data are available at Bioinformatics online.
Read full abstract