Abstract
Clustering is concerned with coherently grouping observations without any explicit concept of true groupings. Spectral graph clustering-clustering the vertices of a graph based on their spectral embedding-is commonly approached via K-means (or, more generally, Gaussian mixture model) clustering composed with either Laplacian spectral embedding (LSE) or adjacency spectral embedding (ASE). Recent theoretical results provide deeper understanding of the problem and solutions and lead us to a "two-truths" LSE vs. ASE spectral graph clustering phenomenon convincingly illustrated here via a diffusion MRI connectome dataset: The different embedding methods yield different clustering results, with LSE capturing left hemisphere/right hemisphere affinity structure and ASE capturing gray matter/white matter core-periphery structure.
Highlights
Clustering is concerned with coherently grouping observations without any explicit concept of true groupings
Spectral graph clustering—clustering the vertices of a graph based on their spectral embedding—is commonly approached via K-means clustering composed with either Laplacian spectral embedding (LSE) or adjacency spectral embedding (ASE)
Our interest is to compare and contrast the two spectral embedding methods for clustering into two clusters. We demonstrate that this synthetic case exhibits the two-truths phenomenon both theoretically and in simulation—the {LG,LW,RG,RW} a priori projection of our composite connectome yields a four-block two-truths stochastic block model (SBM)
Summary
Clustering is concerned with coherently grouping observations without any explicit concept of true groupings. It is often the case that practitioners cluster the vertices of a graph—say, via K -means clustering composed with Laplacian spectral embedding—and pronounce the method as having performed either well or poorly based on whether the resulting clusters correspond well or poorly with some known or preconceived notion of “correct” clustering. Such a procedure may be used to compare two clustering methods and to pronounce that one works better (on the particular data under consideration). With respect to graph clustering, ref. 1 shows that there can be no algorithm that is optimal for all possible community detection tasks (Fig. 1)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.