Abstract

In this research, document clustering is analyzed with the help of Adaptive Jaro Winkler with Jellyfish Search Clustering (AJWJSC) algorithm and Chimp Optimization Algorithm (COA). The major motive of the research is to compute the relevant topics with the easiest way in addition reduce the complexity of the domain analysis. The document retrieval process is analysis for the recent topic detection and identification. In the research, four stages are considered to analyze the documents, named as pre-processing, feature extraction, feature knowledge establishment, in addition document clustering phase. Initially, the documents can be pre-processed with the consideration on tokenization, stop-word removal and stemming methods. After that, the Relative Document-Term Frequency Difference (RD-TFD) technique can be utilized to extract the features. Based on the extracted feature set, the essential features are nominated with the help of Chimp Optimization Algorithm (COA). Afterwards, the document clustering process is computed using the Adaptive Jaro Winkler with Jellyfish Search Clustering (AJWJSC) algorithm. The novelty of the work is to document clustering from the documents for utilized in different applications. The proposed method is designed in the MATLAB platform and analyzed with the three datasets named as Reuter database, 20 Newsgroups and Topic detection and tracking (TDT2) database respectively. While analyzing the statistical measures of the research, various parameters is determined like precision, Recall, F-measure, accuracy, and the efficiency. Here, the clustered documents are validated based on their similarity and this is used for the document retrieval purpose. The projected technique can be contrasted with the k-means Clustering, Krill herd (KH) algorithm, and Moth Flame Optimization (MFO) Algorithm respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call