Abstract

Notice of Retraction-----------------------------------------------------------------------After careful and considered review of the content of this paper by a duly constituted expert committee, this paper has been found to be in violation of APTIKOM's Publication Principles.We hereby retract the content of this paper. Reasonable effort should be made to remove all past references to this paper.The presenting author of this paper has the option to appeal this decision by contacting ij.aptikom@gmail.com.-----------------------------------------------------------------------Clustering of mixed numerical and categorical data has become a challenge now a days. A number of algorithms dealing with mixed data have been proposed. Speed and simplicity are the two major features that have made the K-Prototype algorithm a famous partition based clustering algorithm. This algorithm has a constraint of providing the value of K initially and sometimes predicting the optimum number of clusters in advance becomes practically impossible. In this paper, a new algorithm based on the K-Prototype algorithm for clustering mixed data with advanced features for automatic generation of appropriate number of clusters is presented.

Highlights

  • INTRODUCTION The KPrototype algorithm is a variant of K-Means that can be used with numeric or categorical datasets

  • K-Prototype extends the idea of K-Means by applying Euclidean distance to numeric attributes and Binary distance to categorical attributes

  • The Binary distance for categorical attributes does not represent the real situation as the categorical values may have some other degree of difference rather than just 0 or 1

Read more

Summary

Chapter 5

Earlier, clustering techniques were developed focusing on single type of attributes, either numerical or categorical. A new algorithm based on the K-Prototype has been developed for clustering mixed dataset. This algorithm overcomes the limitation of providing the number of clusters required as an input and has advanced features for automatic generation of appropriate number of clusters. Y. et al (2013) presented a similarity metric that can be applied to categorical, numerical, and mixed attributes Based on this similarity metric an iterative clustering algorithm is developed. The K-Prototype algorithm is extended to overcome the limitation of providing number of clusters required or any other parameter as an input. The methods suggested by Ahmad et al (2007) to find the most significant attribute, to find the centroid in a mixed dataset and to find the distance of an object from centroid are used

The Proposed Objective Function
Finding Centroids of the Clusters
The Pseudocode of the Proposed K-Prototype Algorithm Input
Dataset2
Dataset3
COMPARATIVE ANALYSIS OF THE PROPOSED ALGORITHM
Results on Lymphography Dataset
FilteredFirst
Results on Post-Operative Patient Data
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.