Abstract

Medical Data mining is the process of extracting hidden patterns from medical data. Among the several clustering algorithms, k-means is the one of most extensively used clustering techniques in addition to fuzzy k-means clustering. The performance of both k-means and fuzzy k-means clustering is influenced by the initial cluster centers and might converge to local optimum. In addition, the performance of any data mining algorithm is influenced by the significant feature subset. This paper attempts to augment the performance of both k-means and fuzzy k-means clustering using two stages. As part of first stage, this paper investigates the use of wrapper approach of feature selection for clustering, where Genetic algorithm (GA) is used as a random search technique for subset generation, wrapped with k-means clustering. In the second stage of projected work, GA and Entropy based fuzzy clustering (EFC) are used to find the initial centroids for both k-means and fuzzy k-means clustering. Investigations have been directed using standard medical dataset namely Pima Indians Diabetes Dataset (PIDD). Experimental results confirm markable decline of almost 7% in the classification error of both k-means and fuzzy k-means clustering with GA nominated significant features and GA identified initial centroids when compared to randomly selected centroids with all features.Keywordsk-means clusteringfuzzy k-means clusteringGenetic algorithmfeature selectioncluster center initializationentropy based fuzzy clusteringDiabetic dataset

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call