An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining.

Salim Miloudi,Wenjia Ding,Yulin Wang

doi:10.3390/e23050553

Salim Miloudi, Wenjia Ding + Show 1 more

Open Access

PDF Available

https://doi.org/10.3390/e23050553

Copy DOI

Export

Save

Cite

Journal: Entropy (Basel, Switzerland)	Publication Date: Apr 29, 2021
Citations: 1	License type: CC BY 4.0

Affiliation: Wuhan University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Clustering algorithms for multi-database mining (MDM) rely on computing pairwise similarities between n multiple databases to generate and evaluate candidate clusterings in order to select the ideal partitioning that optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the n databases in one cluster or by returning n singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness of the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms, which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in fewer upper-bounded iterations. To achieve our goal, we use coordinate descent (CD) and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure in less than iterations. By using a max-heap data structure within our CD algorithm, we optimally choose the largest weight variable at each iteration i such that taking the partial derivative of with respect to allows us to attain the next steepest descent minimizing without using a learning rate. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.

Highlights

Large multi-branch companies need to analyze multiple databases to discover useful patterns for the decision-making process
To address the issues associated with clustroid initialization, preselection of a suitable number of clusters and non-convexity of the clustering quality objectives, we proposed in [25,26] an algorithm named GDMDBClustering, which minimizes a quasi-convex loss function quantifying the quality of the multi-database clustering, without a priori assumptions about which number of clusters should be chosen
An improved similarity-based clustering algorithm for multi-database mining was proposed in this paper

Summary

Introduction

Large multi-branch companies need to analyze multiple databases to discover useful patterns for the decision-making process. A trivial result is produced, i.e., putting all the n databases in one cluster or returning n singleton clusters To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness in the pairwise similarities by minimizing a weighted binary entropy loss function H(·) via gradient descent and back-propagation. Unlike the multi-database clustering algorithms proposed in [20,21,22,23], our approach uses a convex objective function L(θ) to assess the quality of the produced clustering This allows our algorithm to terminate just after attaining the global minimum of the objective function (i.e., after exploring fewer similarity levels). Exploring and examining individual clusters of similar local patterns is going to help the discovery of new and relevant patterns capable of improving the decision-making quality

Prior Work

Materials and Methods

Background and Relevant Concepts

Clustering Generation and Evaluation

Similarity Matrix Fuzziness Reduction

Proposed Model and Algorithm

Proposed Coordinate Descent-Based Clustering

Proposed Loss Function and Algorithm

Time Complexity Analysis

Result

Performance Evaluation

Similarity Accuracy Analysis

Fuzziness Reduction Analysis

Convexity and Clustering Analysis

Clustering Error and Running Time Analysis

Clustering Comparison and Assessment

Conclusions

D2 D3 D4 D5 D6 D7

D2 D3 D4 D5 D6 D7 D8 D9D10D11D12

Findings

D2 D3 D4 D5 D6 D7 D8 D9 D10

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Similar Papers

A Gradient-Based Clustering for Multi-Database Mining
Salim Miloudi ... Wenjia Ding
IEEE Access | VOL. 9
Salim Miloudi, et. al.Salim Miloudi ... Wenjia Ding
01 Jan 2020
IEEE Access | VOL. 9

Enhancing quality of knowledge synthesized from multi-database mining
Animesh Adhikari ... P.R Rao
Pattern Recognition Letters | VOL. 28
Animesh Adhikari, et. al.Animesh Adhikari ... P.R Rao
15 Aug 2007
Pattern Recognition Letters | VOL. 28

Enhancing Quality of Knowledge Synthesized from Multi-database Mining
Animesh Adhikari ... Witold Pedrycz
-
Animesh Adhikari, et. al.Animesh Adhikari ... Witold Pedrycz
01 Jan 2009
01 Jan 2009

Global Data Fusion versus Local Pattern Fusion in Mining Multiple Databases: A Comparative Review
Abhinav Muley
Journal of Computational and Theoretical Nanoscience | VOL. 17
Abhinav MuleyAbhinav Muley
01 Jul 2020
Journal of Computational and Theoretical Nanoscience | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)