Abstract

The vast repository of informational databases i.e. Web is available to the user in the form of textual documents. It's a challenge to develop an effective information retrieval approach that can ease the user search and increases the visibility of search. Genetic Algorithm based approach has been implemented to increase the visibility of search by expanding the query using Jaccard similarity function as fitness function. The step by step implementation of genetic algorithm for one generation has been explained in the paper and the experiment was repeated for 500 generations to obtain optimum keywords out of which the best suited keyword was considered for expanding the query. The effectiveness of the approach has been experimentally evaluated on manually created training data of retrieved documents for formulated queries using the Google search engine. The basic and ultimate goal of the information retrieval system is to deliver the most similar documents that have the potential to satisfy the user's need and the success of information retrieval system depends on the ability to assess the relevance of objects in its database i.e. information units, documents, functions, commands etc. to the given user's request (1) . With the increase in the content of the information on the internet it is difficult for the user to get the relevant information when a query of two or three words is usually typed by the user for searching any information of interest from the web world. These short queries and the incompatibility between the terms of queries and the documents affect the relevancy of the retrieved documents. When user enters his request in the form of query then the matching mechanism of the search system delivers the ranked list of documents to the user using the similarity functions. The documentary database, query subsystem and matching mechanism are three basic components of information retrieval system (2) (3) (4). The similarity measurement between the different objects is the fundamental function of any information retrieval application and there are varieties of ways to compute the similarity among the different object representations. Textual similarity functions play a vital role in tasks and applications of information retrieval i.e. document clustering, topic detection, question answering, machine translation, text classification and others. Textual similarity can be measured lexically and semantically. If the words have similar character sequence then they are said to be lexically similar but if they are used in the same context one is type of another then they are said as semantically similar. If the user is not satisfied with the results returned by the search system then user reformulates the query there by increasing the retrieval effectiveness iteratively and incrementally (3).The user evaluates the results on the basis of retrieved documents and provides the relevant feedback for the expansion of terms of initial query. This feedback can be used to increase the effectiveness of the retrieval system. Query expansion is a technique used to increase the effectiveness of the information retrieval (4). It is the process of adding some additional terms or phrases to the original query to improve relevancy of the retrieved documents. The reformulated query contains more terms so the probability of matching them with terms in relevant documents is therefore enhanced. The key problem of query expansion is the selection of additional terms based on which user's original query is enhanced. Initial query can be expanded in three different ways i.e. manual, interactive and automatic. In case of the former two, the user's involvement is required where as the user's intervention is not required in case of the automatic query expansion techniques. This paper focuses on the formulation of appropriate query terms for text based search that can result in the increased relevancy of the retrieved textual documents using genetic algorithm. This paper is organized as follows. The first section of paper is related with the brief introduction regarding the effectiveness of the information retrieval system. The second section of paper is related to the literature on applicability of the genetic algorithm in information retrieval and related work. The third section of the paper describes the detailed description of the experiment followed for the implementation of genetic algorithm for the improvement of the relevancy of the retrieved documents by the addition of term in the original query. Fourth section of paper describes the conclusion.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call