Abstract
Nonlinear reformulations of the spectral clustering method have gained a lot of recent attention due to their increased numerical benefits and their solid mathematical background. We present a novel direct multiway spectral clustering algorithm in the p-norm, for pin (1,2]. The problem of computing multiple eigenvectors of the graph p-Laplacian, a nonlinear generalization of the standard graph Laplacian, is recasted as an unconstrained minimization problem on a Grassmann manifold. The value of p is reduced in a pseudocontinuous manner, promoting sparser solution vectors that correspond to optimal graph cuts as p approaches one. Monitoring the monotonic decrease of the balanced graph cuts guarantees that we obtain the best available solution from the p-levels considered. We demonstrate the effectiveness and accuracy of our algorithm in various artificial test-cases. Our numerical examples and comparative results with various state-of-the-art clustering methods indicate that the proposed method obtains high quality clusters both in terms of balanced graph cut metrics and in terms of the accuracy of the labelling assignment. Furthermore, we conduct studies for the classification of facial images and handwritten characters to demonstrate the applicability in real-world datasets.
Highlights
Introduction and related workThe act of creating clusters by segmenting a set into several parts is ever present in every scientific domain that deals with interacting or interconnected data
We developed a new method for multiway p-spectral clustering that leverages recent advancements in Riemannian optimization
Our method reduces the value of p in a pseudocontinuous manner, and ensures that the best result with respect to balanced graph cut metrics is retrieved from the various p-levels
Summary
The act of creating clusters by segmenting a set into several parts is ever present in every scientific domain that deals with interacting or interconnected data. As opposed to other popular clustering techniques, such as the k-means (MacQueen, 1967) and the expectation-maximization algorithm (Dempster et al, 1977), spectral methods perform well in nonconvex sample spaces, as they can avoid local minima (Bichot & Siarry, 2013) They have been successfully applied in various fields of data clustering, such as computer vision (Malik et al, 2001), load balancing (Hendrickson & Leland, 1995), biological systems (Pentney & Meila, 2005) and text classification (Aggarwal & Zhai, 2012), and are a field of active research (Ge et al, 2021; Mizutani, 2021). The constraint on the eigenvectors is enforced by means of a modified gradient descent minimization that might lead to approximation errors, as demonstrated later This mutual orthogonality constraint, combined with the fact that the final partitioning step takes place in the reduced space of p-spectral coordinates, suggests that recasting the problem as an optimization procedure over a subspace that adheres by definition to these conditions, i.e., the Grassmann manifold, would be beneficial. The Riemannian structure of the Grassmann manifold and the development of robust optimization algorithms on it have been extensively researched (Edelman et al, 1999; Absil et al, 2007; Sato & Iwai, 2014)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have