KH-FC: krill herd-based fractional calculus algorithm for text document clustering using MapReduce structure

Priyanka Shivaprasad More,Baljit Singh Saini

doi:10.1504/ijcse.2022.127188

Abstract

In this paper, krill herd-based fractional calculus (KH-FC) using MapReduce framework is proposed for effective text document clustering. Here, the stop word removal and stemming model is applied in the pre-processing step, helps to remove redundant information and hence the size of the information is reduced, which further enhances the clustering accuracy. Furthermore, term frequency (TF) and inverse document frequency (IDF) are employed for extracting significant features. Finally, the developed KH-FC model is utilised for clustering the text documents. The developed KH-FC algorithm is developed by combining the FC concept into the KH technique. In this method, pre-processing and feature extraction is performed in the mapper phase, whereas the clustering process is executed in the reducer phase. The performance of the developed approach is evaluated based on performance metrics, like accuracy, Jaccard coefficient, and F-measure. The developed KH-FC approach obtained better performance in terms of accuracy, Jaccard coefficient, and F-measure is 0.983, 0.936 and 0.967, respectively.

Full Text