Abstract

This article extends the expectation-maximization (EM) formulation for the Gaussian mixture model (GMM) with a novel weighted dissimilarity loss. This extension results in the fusion of two different clustering methods, namely, centroid-based clustering and graph clustering in the same framework in order to leverage their advantages. The fusion of centroid-based clustering and graph clustering results in a simple "soft" asynchronous hybrid clustering method. The proposed algorithm may start as a pure centroid-based clustering algorithm (e.g., k -means), and as the time evolves, it may eventually and gradually turn into a pure graph clustering algorithm [e.g., basic greedy asynchronous distributed interference avoidance (GADIA) (Babadi and Tarokh, 2010)] as the algorithm converges and vice versa. The "hard" version of the proposed hybrid algorithm includes the standard Hopfield neural networks (and, thus, Bruck's Ln algorithm by (Bruck, 1990) and the Ising model in statistical mechanics), Babadi and Tarokh's basic GADIA in 2010, and the standard k -means (Steinhaus, 1956), (MacQueen, 1967) [i.e., the Lloyd algorithm (Lloyd, 1957, 1982)] as its special cases. We call the "hard version" of the proposed clustering as "hybrid-nongreedy asynchronous clustering (H-NAC)." We apply the H-NAC to various clustering problems using well-known benchmark datasets. The computer simulations confirm the superior performance of the H-NAC compared to the k -means clustering, k -GADIA, spectral clustering, and a very recent clustering algorithm structured graph learning (SGL) by Kang et al. (2021), which represents one of the state-of-the-art clustering algorithms.

Highlights

  • AND MOTIVATIONC LUSTERING is a fundamental mechanism in data processing and machine learning applications, and it is a fundamental research area

  • The extended k-Ln clustering algorithm turns out to be equivalent to the basic version of the pioneering algorithm greedy asynchronous distributed interference avoidance (GADIA) of Babadi and Tarokh [2]

  • The reason why we have chosen the structured graph learning (SGL) in [46] as a reference algorithm in our article is because the SGL [46] has shown superior performance compared to many state-of-the-art clustering methods, such as the accelerated low-rank representation (ALRR) published in 2018 [72], the K -multiple-means (KMMs) in 2019 [73], the efficient sparse subspace clustering (ESSC) in 2020 [74], the fast normalized cut (FNC) in 2018 [75], and the sparse subspace clusteringorthogonal matching pursuit (SSC-OMP) (a popular sparse subspace clustering (SSC) algorithm) [76]

Read more

Summary

Introduction

C LUSTERING is a fundamental mechanism in data processing and machine learning applications, and it is a fundamental research area It is the task of grouping a set of objects so that objects in the same group are more similar to each other than to those in other groups, and it helps in understanding and discovering the natural grouping in a dataset. There is not a magical clustering method that solves all different types of challenging real-life clustering problems with the best performance. Distribution-based clustering suffers from an overfitting problem it has a strong theoretical foundation. Another prominent method, the Gaussian mixture model (GMM), assumes Gaussian distributions, which is a rather strong assumption for various real-life datasets. For a given particular clustering problem and its datasets, we often end up determining the most appropriate clustering algorithm experimentally

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.