An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

Kang Zhang,Xingsheng Gu

doi:10.1155/2014/486075

Kang Zhang, Xingsheng Gu

Open Access

https://doi.org/10.1155/2014/486075

Copy DOI

Abstract

Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP) algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

Highlights

With the development of information technology and with the wide use of computer and networks, the explosion of data in almost all fields provides a totally new perspective for data scientists towards knowledge discovery and future decision
Based on Affinity propagation (AP) algorithm and Ahmad and Dey’s mixed similarities measure architecture [18], this paper proposes an adaption affinity propagation clustering method for mixed numeric and categorical attributes datasets using a novel similarity measure as a cost function
Extracting knowledge and information from mixed data meets the urgent needs of real world applications

Summary

Introduction

With the development of information technology and with the wide use of computer and networks, the explosion of data in almost all fields provides a totally new perspective for data scientists towards knowledge discovery and future decision. Due to the information loss in dealing with cluster center and the simple binary distance measure between two categorical attributes of Huang’s algorithm, Ahmad and Dey [18] developed a modified cost function alleviating the shortcomings of Huang’s cost function based on a k-mean type algorithm. Based on AP algorithm and Ahmad and Dey’s mixed similarities measure architecture [18], this paper proposes an adaption affinity propagation clustering method for mixed numeric and categorical attributes datasets using a novel similarity measure as a cost function.

Background

Method

Experimental Evaluation

Conclusion