Abstract

This paper intends to develop a novel framework for text document clustering with the aid of a new improved meta-heuristic algorithm. Initially, the features are selected from the text document by subjecting each word under Term Frequency-Inverse Document Frequency (TF-IDF) computation. Subsequently, centroid selection plays a vital role in cluster formation, which is done using a new Improved Lion Algorithm (LA) termed as Cross over probability-based LA model (CP-LA). As a novelty, this paper introduced a new inter and intracluster similarity model. Moreover, this centroid selection is made in such a way that the proposed adaptive weighted similarity should be minimal. Based on the characteristics of the document, the weights are automatically adapted with the similarity measure. The proposed adaptive weighted similarity function involves the inter-cluster, and intra-cluster similarity of both ordered and unordered documents. Finally, the superiority of the proposed over other models is proved under different performance measures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call