An Improved Genetic Algorithm for Document Clustering on the Cloud

Ruksana Akter,Yoojin Chung

doi:10.4018/ijcac.2018100102

Abstract

This article presents a modified genetic algorithm for text document clustering on the cloud. Traditional approaches of genetic algorithms in document clustering represents chromosomes based on cluster centroids, and does not divide cluster centroids during crossover operations. This limits the possibility of the algorithm to introduce different variations to the population, leading it to be trapped in local minima. In this approach, a crossover point may be selected even at a position inside a cluster centroid, which allows modifying some cluster centroids. This also guides the algorithm to get rid of the local minima, and find better solutions than the traditional approaches. Moreover, instead of running only one genetic algorithm as done in the traditional approaches, this article partitions the population and runs a genetic algorithm on each of them. This gives an opportunity to simultaneously run different parts of the algorithm on different virtual machines in cloud environments. Experimental results also demonstrate that the accuracy of the proposed approach is at least 4% higher than the other approaches.

Full Text