Simple K-Medoids Partitioning Algorithm for Mixed Variable Data

Weksi Budiaji,Friedrich Leisch

doi:10.3390/a12090177

Abstract

A simple and fast k-medoids algorithm that updates medoids by minimizing the total distance within clusters has been developed. Although it is simple and fast, as its name suggests, it nonetheless has neglected local optima and empty clusters that may arise. With the distance as an input to the algorithm, a generalized distance function is developed to increase the variation of the distances, especially for a mixed variable dataset. The variation of the distances is a crucial part of a partitioning algorithm due to different distances producing different outcomes. The experimental results of the simple k-medoids algorithm produce consistently good performances in various settings of mixed variable data. It also has a high cluster accuracy compared to other distance-based partitioning algorithms for mixed variable data.

Highlights

Cluster analysis is a vital exploratory tool in data structure investigation
The most common practice for a mixed variable dataset is applying the partitioning around medoids (PAM) [2], which replaces the centroids with the medoids
By taking local optima and empty clusters into consideration, we propose a k-medoids algorithm that improves the performance of the simple and fast k-medoids (SFKM) algorithm (SFKM)

Summary

Introduction

Cluster analysis is a vital exploratory tool in data structure investigation. Each object within a group is similar (homogeneous) to each other, and objects between groups are distinct (heterogeneous) from one another [1,2]. The k-means algorithm, is irrelevant when the data are mixed variable data because “means” as the center of the clusters (centroid) are unavailable and the Euclidean distance is not applicable. The most common practice for a mixed variable dataset is applying the partitioning around medoids (PAM) [2], which replaces the centroids with the medoids. After defining the distance for the mixed variable data, either the k-prototype or PAM algorithm is applied. PAM as a medoid-based algorithm is more robust with respect to the cluster center definition than centroid-based algorithms. In the medoid updating step, on the other hand, they are very similar Both algorithms operate within clusters only, like centroid updating in k-means, for the medoid updating. We generalize a distance function, which is feasible for any numerical and categorical distance combination and its respective weight

K-Medoids Algorithms

Proposed K-Medoids Algorithm

Proposed Distance Method

Demonstration on Artificial and Real Datasets

Different Variable Proportions

Different Number of Clusters

Different Numbers of Variables

Different Numbers of Objects

Iris Data

Wine Data

Vote Data

Zoo Data

Credit Approval Data

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Aug 24, 2019
Citations: 34	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Simple K-Medoids Partitioning Algorithm for Mixed Variable Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

An Ameliorated Partitioning Clustering Algorithm
Raghavi Chouhan ... Abhishek Chauhan
-
Raghavi Chouhan, et. al.Raghavi Chouhan ... Abhishek Chauhan
01 Nov 2014
01 Nov 2014

Careful Seeding for the K-Medoids Algorithm with Incremental K++ Cluster Construction
Difei Cheng ... Bo Zhang
SSRN Electronic Journal | VOL. -
Difei Cheng, et. al.Difei Cheng ... Bo Zhang
01 Jan 2021
SSRN Electronic Journal | VOL. -

The Research of Intrusion Detection Based on Mixed Clustering Algorithm
Nanyan Liu
-
Nanyan LiuNanyan Liu
01 Jan 2012
01 Jan 2012

A clustering method for very large mixed data sets
G Sanchez-Diaz ... J Ruiz-Shulcloper
-
G Sanchez-Diaz, et. al.G Sanchez-Diaz ... J Ruiz-Shulcloper
29 Nov 2001
29 Nov 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simple K-Medoids Partitioning Algorithm for Mixed Variable Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms