Abstract
Data mining is a collection of methods used to extract useful information from large data bases. Cluster Analysis refers to the grouping of a set of data points into clusters. Most widely used partitioning methods are K-means and Fuzzy c-means (FCM) algorithms. However, they suffer from the difficulties such as random selection of initial centre values and handling outlier data points. Most of the existing clustering methods use the Euclidean distance metric. The modified fuzzy c-means algorithm (MFCM) is efficient in handling outlier data points. In this paper, a new hybrid algorithm is proposed to solve the limitations of the traditional clustering methods. The hybrid K-MFCM algorithm is tested on four real world bench mark data sets from UCI machine learning repository with various distance metrics including Euclidean, City Block and Chessboard. The cluster centroid values of hybrid algorithm are calculated for various data sets. The experimental results show that the hybrid algorithm gives good results in terms of objective function value and better fuzzy cluster validity results for chessboard distance metric than other distance metrics.
Highlights
INTRODUCTIONThere has been an explosive growth in the generation and storage of electronic information
The distance metrics play an important role in data clustering
The objective of this paper is to study the performance of K-modified fuzzy c-means algorithm (MFCM) algorithm to data clustering problems using different distance metrics
Summary
There has been an explosive growth in the generation and storage of electronic information. The organizations are unable to find useful information in the database. Extracting information and knowledge from a large database is a challenging task. Data Mining is the process of extracting or mining knowledge from large databases. It involves the use of data analysis techniques to discover previously unknown, useful patterns and relationships in large data sets. Some general applications of clustering include medical analysis, pattern analysis, biometrics, image processing, marketing and information retrieval [1]. Euclidean distance metric is used in most existing clustering algorithms. The combination of K-means and modified fuzzy c-means algorithm (K-MFCM) is proposed using city block and chessboard distance measures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Research in Computer Science
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.