Efficient Methods to Generate Inverted Indexes for IR

Arun Kumar Yadav,Deepak Rai,Divakar Yadav

doi:10.1007/978-81-322-2757-1_43

Abstract

Information retrieval systems developed during last 2–3 decades have marked the existence of web search engines. These search engines have become an important role player in the field of information seeking. This increasing importance of search engines in the field of information retrieval has compelled the search engine companies to put their best for the improvement of the search results. Therefore the measurement of the search efficiency has become an important issue. Information retrieval is basically used for identifying the activities which makes us capable to extract the required documents from a document repository. Information retrieval today is done on the basis of numerous textual and geographical queries having both the textual and spatial components. The textual queries of any IRS are resolved using indexes and an inversion list. This paper mainly concentrates on the indexing part and the analysis of the algorithm. Several structures are in existence for implementing these indexes. Hash tables, B-trees, sorted arrays, wavelet trees are few to name. For an efficient data structure there are different deciding parameters that are to be taken into account. Some important parameters considered in this paper are index creation time, storage required by inverted file and retrieval time. This paper provides a detailed comparative study of different data structures for the implementation of inverted files.

Full Text