Abstract
Clustering is an unsupervised classification method that focused on grouping data into clusters. The objects in each cluster are very similar but different from the objects in the other clusters. As clustering methods deal with the massive amount of information, many intelligent software agents have been widely utilized clustering techniques to filter, retrieve, and categorize documents that exist on the World Wide Web. Web mining is generally classified under data mining. In data mining, one of the significant clustering centroid-based partitioning methods is the K-Means algorithm. One of the K-Means algorithm's challenges is its extreme sensitivity to initial cluster centers' choice, which may yield get stuck in the local optimum if the initial centers are selected randomly. A variant of the K-Means method is the K-Means++ algorithm, which improves the algorithm's performance by smart choices of initialization of the cluster centroids. Evolutionary techniques, widely utilized for optimizing clustering methods by providing their prerequisite parameters. The Genetic Algorithm is stochastic and population-based, that applied in optimization problem-solving. This paper proposed a Genetic-based K-Means (GBKM) clustering algorithm where the clusters' centroids are encoded by chromosomes rather than random initial cluster centroids. The best cluster centers gave by the Genetic algorithm that maximizes the fitness function, as initial points of the K-Means algorithm. The results show this model helps increase the K-Means algorithm's performance by appropriate choice of initialization of the cluster centroids, compared to four other clustering algorithms.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have