An improved approach to English-Hindi based Cross Language Information Retrieval system

Eva Katta,Anuja Arora

doi:10.1109/ic3.2015.7346706

Abstract

Cross Language Information Retrieval (CLIR) is a sub domain of Information Retrieval. It deals with retrieval of information in a specified language that is different from the language of user's query. In this paper, an improved English-Hindi based CLIR is proposed. There are various un-noticed domains in this broad research area that are required to be worked upon in order to improve the performance of an English-Hindi based CLIR. Not much research effort has been put up to improve the searching and ranking aspects of CLIR systems, especially in case of English-Hindi based CLIR. This paper focuses on applying algorithms like Naive Bayes and particle swarm optimization in order to improve ranking and searching aspects of a CLIR system. We matched terms contained in documents to the query terms in same sequence as present in the search query to make our system more efficient. Along with this our approach also makes use of bilingual English-Hindi translator for query conversion in Hindi language. Further, we use Hindi query extension and synonym generation which helps in retrieving more relevant results in an English-Hindi based CLIR as compared to existing one. Both of these techniques applied to this improved approach gives user a change to choose more appropriate Hindi query than just by using the single translated query and hence improving overall performance.

Full Text