Abstract

The K-means algorithm is quite sensitive to the cluster centers selected initially and can perform different clusterings depending on these initialization conditions. Within the scope of this study, a new method based on the Fuzzy ART algorithm which is called Improved Fuzzy ART (IFART) is used in the determination of initial cluster centers. By using IFART, better quality clusters are achieved than Fuzzy ART do and also IFART is as good as Fuzzy ART about capable of fast clustering and capability on large scaled data clustering. Consequently, it is observed that, with the proposed method, the clustering operation is completed in fewer steps, that it is performed in a more stable manner by fixing the initialization points and that it is completed with a smaller error margin compared with the conventional K-means.

Highlights

  • Clustering is one of the important tools of knowledge discovery

  • Stability: the randomly initialized K-means algorithm was run with 100 different initialization points, and the clusters formed for each initialization were analyzed

  • The synthetic datasets: Web Logs (WL), Documents_Sim (DS), Mars, and Image Extraction (IE) are taken from the databases prepared by Pei and Zaiane 19 at Canada’s Alberta University, Department of Computer Sciences

Read more

Summary

Introduction

Clustering is one of the important tools of knowledge discovery. In clustering process, the similar data are grouped with different unsupervised algorithms. K-means is a partitioning algorithm that divides data into K groups. How to select good initial clustering centers is an important issue for K-means algorithm. 2 Conventional K-means generates initial cluster centers randomly. When initial starting points close to the final solution, K-means has high possibility to find out the cluster center. In the literature several methods proposed to solve the cluster initialization problem for K-means. 5 Bradley and Fayyad 6 proposed an algorithm that refines initial points by analyzing probability of data density. Shehroz and Ahmad 7 proposed Cluster Center Initialization Algorithm (CCIA) to solve cluster initialization problem. Su and Dy 8 proposed a deterministic initialization method for K-means based divisive hierarchical approach. Kohei and Barakbah 9 proposed a hierarchical K-means algorithm as a new approach to determine the centers initialization for K-means.

K-means Algorithm
The Proposed Algorithm
A New Method for Initializing K-means
Clustering Error Estimation Index
Experimental Results
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.