Efficient Document Retrieval using Annotation, Searching and Ranking

Poonam Dhamal,Sonal Kutade

doi:10.5120/18904-0198

Abstract

is always difficult to find relevant information in unstructured text documents. In this paper we study the methods of fuzzy search, instant search and proximity ranking and how they can be used in the process of annotation of documents. These various methods can be integrated to give better search results and to achieve efficient space and time complexities. We propose a novel alternative approach which facilitates the generation of the structured metadata automatically using OpenNLP, methods of Instant-fuzzy search and Proximity ranking. It is done by identifying documents which are likely to contain the information of interest. And this information will be subsequently useful for querying the database. Fig 1: Document retrieval If a user wants an efficient document retrieval process then annotation, document searching methods and ranking methods play a vital role in whole retrieval process. Here we discuss what these techniques are and how these different techniques are used in this document retrieval system. Ranking: In the process of ranking every query answer is ranked based on its similarity or relevance to query, it is defined on various information pieces like co-occurrence of some keywords of query as a phrase in record and the query keywords frequencies in the record. Domain-specific features can play a vital role in ranking. E.g., for some publication, number of citations can be used as an indication in ranking because it is a good indicator of its impact. The Phrase matching effect in ranking gives better results. E.g., for the query q = bbrain, surgeryii, record containing the phrase brain surgery is more relevant than the record containing the

Full Text