Recent progress in multiplexed tissue imaging is advancing the study of tumor microenvironments to enhance our understanding of treatment response and disease progression. Cellular neighborhood analysis is a popular computational approach for these complex image data. Despite its popularity, there are significant challenges, including high computational demands that limit feasibility for large-scale applications and the lack of a principled strategy for integrative analysis across images. This absence hampers the precise and consistent identification of spatial features and tracking of their dynamics over disease progression. To overcome these challenges, we introduce SpaTopic, a spatial topic model designed to decode high-level spatial architecture across multiplexed tissue images. This algorithm integrates both cell type and spatial information within a topic modelling framework, originally developed for natural language processing and adapted for computer vision. Spatial information is incorporated into the flexible design of documents, representing densely overlapping regions in images. The model employs an efficient collapsed Gibbs sampling algorithm for both statistical and computational inference. We benchmarked the performance against five state-of-the-art algorithms through various case studies using different single-cell spatial transcriptomic and proteomic imaging platforms across different tissue types. Our findings demonstrate that SpaTopic consistently identifies biologically and clinically significant spatial "topics" such as tertiary lymphoid structures (TLSs) and tracks dynamic changes in spatial features over disease progression. Its computational efficiency and broad applicability across various molecular imaging platforms will enhance the analysis of large-scale tissue imaging datasets.
Read full abstract