Abstract

Text mining refers to the process of extracting the high-quality information from the text. It is broadly used in applications, like text clustering, text categorization, text classification, etc. Recently, the text clustering becomes the facilitating and challenging task used to group the text document. Due to some irrelevant terms and large dimension, the accuracy of text clustering is reduced. In this paper, the semantic word processing and novel Particle Grey Wolf Optimizer (PGWO) is proposed for automatic text clustering. Initially, the text documents are given as input to the pre-processing step which caters the useful keyword for feature extraction and clustering. Then, the resultant keyword is applied to wordnet ontology to find out the synonyms and hyponyms of every keyword. Subsequently, the frequency is determined for every keyword which is used to build the text feature library. Since the text feature library contains the larger dimension, the entropy is utilized to select the most significant feature. Finally, the new algorithm Particle Grey Wolf Optimizer (PGWO) is developed by integrating the particle swarm optimization (PSO) into the grey wolf optimizer (GWO). Thus, the proposed algorithm is used to assign the class labels to generate the different clusters of text documents. The simulation is performed to analyze the performance of the proposed algorithm, and the proposed algorithm is compared with existing algorithms. The proposed method attains the clustering accuracy of 80.36% for 20 Newsgroup dataset and the clustering accuracy of 79.63% for Reuter which ensures the better automatic text clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.