Abstract
Kernel k-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Its merits are thoroughly validated on a suite of simulated datasets and real data benchmarks that feature nonlinear and multi-view separation. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from the non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings. Called Kernel Power k-Means, our algorithm uses majorization-minimization (MM) to better solve this non-convex problem. We show that the method implicitly performs annealing in kernel feature space while retaining efficient, closed-form updates. We rigorously characterize its convergence properties both from computational and statistical points of view. In particular, we characterize the large sample behavior of the proposed method by establishing strong consistency guarantees as well as finite-sample bounds on the excess risk of the estimates through modern tools in learning theory. The proposal's efficacy is demonstrated through an array of simulated and real data experiments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Pattern Analysis and Machine Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.