Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Ziqi Jia,Ling Song

doi:10.1155/2020/5143797

Abstract

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.

Highlights

Cluster analysis belongs to unsupervised learning and is an important research direction in the field of machine learning [1]
E modes vector is a combination of the eigenvalue that occurs most frequently of each feature in the subcluster. e dissimilarity between data objects to be clustered and the cluster is calculated by simple Hamming distance, and only the Categorical Data can be processed
Researchers have carried out a series of exploratory studies. k-prototypes algorithm [6] and its variant algorithm are mixed-type data clustering algorithms that take into account the Dissimilarity Coefficient of Categorical Feature and Numerical Feature at the same time

Summary

Introduction

Cluster analysis belongs to unsupervised learning and is an important research direction in the field of machine learning [1]. There are four disadvantages to using binary encoding for data preprocessing: (1) the original structure of the Categorical Data is destroyed, resulting in the meaningless binary features after conversion; (2) the implicit information of dissimilarity is ignored, which cannot truly reflect the structure of the dataset; (3) if the range of eigenvalues is large, the converted binary eigenvalues will have a larger dimension; and (4) maintenance is difficult, if new eigenvalues are added for the Categorical Feature, all data objects will change [5] To solve these problems, researchers have carried out a series of exploratory studies. K-prototypes algorithm and its variants were analyzed and compared, and the automatic determination method of initial Cluster Centers was improved, and a new Hybrid Dissimilarity Coefficient is proposed.

The k-Prototypes Algorithm

Quantized Numerical Dissimilarity Coefficient

A1 A2 A3

Weighted Hybrid Dissimilarity Coefficient

Cost Function considering Weights

Experimental Results and Analysis

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematical Problems in Engineering	Publication Date: Jul 25, 2020
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering

Lead the way for us

Similar Papers

Density Peak Clustering Algorithm Based on High Density Connection with Entropy Optimization
Weiguo Yi ... Siwei Ma
-
Weiguo Yi, et. al.Weiguo Yi ... Siwei Ma
22 Jul 2022
22 Jul 2022

A Fast K-prototypes Algorithm Using Partial Distance Computation
Byoungwook Kim
Symmetry | VOL. 9
Byoungwook KimByoungwook Kim
21 Apr 2017
Symmetry | VOL. 9

Using Genetic Algorithm for Selection of Initial Cluster Centers for the K-Means Method
Wojciech Kwedlo ... Piotr Iwanowicz
-
Wojciech Kwedlo, et. al.Wojciech Kwedlo ... Piotr Iwanowicz
01 Jan 2009
01 Jan 2009

A SimRank based Ensemble Method for Resolving Challenges of Partition Clustering Methods
...
Journal of Scientific & Industrial Research | VOL. 79
, et. al. ...
01 Apr 2020
Journal of Scientific & Industrial Research | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering