Abstract

Data mining is a collection of methods used to extract useful information from large data bases. Cluster Analysis refers to the grouping of a set of data points into clusters. Most widely used partitioning methods are K-means and Fuzzy c-means (FCM) algorithms. However, they suffer from the difficulties such as random selection of initial centre values and handling outlier data points. Most of the existing clustering methods use the Euclidean distance metric. The modified fuzzy c-means algorithm (MFCM) is efficient in handling outlier data points. In this paper, a new hybrid algorithm is proposed to solve the limitations of the traditional clustering methods. The hybrid K-MFCM algorithm is tested on four real world bench mark data sets from UCI machine learning repository with various distance metrics including Euclidean, City Block and Chessboard. The cluster centroid values of hybrid algorithm are calculated for various data sets. The experimental results show that the hybrid algorithm gives good results in terms of objective function value and better fuzzy cluster validity results for chessboard distance metric than other distance metrics.

Highlights

  • INTRODUCTIONThere has been an explosive growth in the generation and storage of electronic information

  • The distance metrics play an important role in data clustering

  • The objective of this paper is to study the performance of K-modified fuzzy c-means algorithm (MFCM) algorithm to data clustering problems using different distance metrics

Read more

Summary

INTRODUCTION

There has been an explosive growth in the generation and storage of electronic information. The organizations are unable to find useful information in the database. Extracting information and knowledge from a large database is a challenging task. Data Mining is the process of extracting or mining knowledge from large databases. It involves the use of data analysis techniques to discover previously unknown, useful patterns and relationships in large data sets. Some general applications of clustering include medical analysis, pattern analysis, biometrics, image processing, marketing and information retrieval [1]. Euclidean distance metric is used in most existing clustering algorithms. The combination of K-means and modified fuzzy c-means algorithm (K-MFCM) is proposed using city block and chessboard distance measures.

REVIEW OF LITERATURE
Distance Metrics
Hybrid K-MFCM Algorithm
RESULTS AND DISCUSSION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call