Abstract

Document retrieval process is more significant in the field of research community for retrieving the highly-relevant documents that fit for the user query. Even though various document retrieval methods are introduced, retrieving the exact document based on the indexing is a quite challenging task in the document retrieval framework. Thus, an effective document retrieval algorithm named Rider Spider Monkey Optimization Algorithm (RSOA) is proposed in this research. Initially, the documents are pre-processed by the stop word elimination and the stemming process, and the features are extracted to find the key words of the documents by applying the Term Frequency-Inverse Document Frequency (TF-IDF). The selected keywords are passed into the cluster-based indexing phase, where the cluster centroids are identified by using the proposed Rider Spider Monkey Optimization Algorithm. Moreover the query matching is carried out at two levels, at first, the query is forwarded and is matched to the entire cluster centroid to find the appropriate centroid. At the second level; the user query is matched based on the records present inside the matched centroid. Moreover, the query matching is progressed using the distance measure by the Bhattacharya distance to retrieve the documents. The performance is analyzed using the metrics, namely precision, F-measure, and recall and accuracy with the values of 90.141%, 91.876%, 91.178%, and 91.202%, respectively using 20 news group dataset .

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.