Optimization driven cluster based indexing and matching for the document retrieval

Mamta Kayest,Sanjay Kumar Jain

doi:10.1016/j.jksuci.2019.02.012

Abstract

Document retrieval methods concentrate on minimizing the time taken for the navigator to recall the entire document while analyzing the concepts, themes, and contents of the document based on their research goals. The exploitation of the repetitiveness in order to reduce the usage space is a hectic challenge. This paper proposes a document retrieval mechanism using an optimization, Monarch Butterfly optimization-based FireFly (MB-FF), developed with the integration of the Monarch Butterfly Optimization (MBO) and Firefly Algorithm (FA). The keywords from the documents are identified from the pre-processed document, which is pre-processed using stemming and stop word removal. The Term Frequency-Inverse Document Frequency (TF-IDF) is used in the extraction of the keywords and the concept of holoentropy is used in the selection of the significant keywords. The selected keywords assures the retrieval of the relevant documents, which initially is processed through cluster-based indexing using the Monarch Butterfly optimization-based firefly (MB-FF) that is followed with the two-level mod-Bhattacharya distance match. The performance of the MB-FF algorithm in document retrieval mechanism is evaluated using Precision, recall, and F-measure.

Full Text