Abstract

The main issue in Cross Language Information Retrieval (CLIR) is the poor performance of retrieval in terms of average precision when compared to monolingual retrieval performance. The main reasons behind poor performance of CLIR are mismatching of query terms, lexical ambiguity and un-translated query terms. The existing problems of CLIR are needed to be addressed in order to increase the performance of the CLIR system. In this paper, we are putting our effort to solve the given problem by proposed an algorithm for improving the performance of English-Hindi CLIR system. We used all possible combination of Hindi translated query using transliteration of English query terms and choosing the best query among them for retrieval of documents. The experiment is performed on FIRE 2010 (Forum of Information Retrieval Evaluation) datasets. The experimental result show that the proposed approach gives better performance of English-Hindi CLIR system and also helps in overcoming existing problems and outperforms the existing English-Hindi CLIR system in terms of average precision.

Highlights

  • We are rapidly constructing the broad network architecture for transferring information across national barriers, but much remains to be done before linguistic boundaries can be better as effectively as geographic ones [1]

  • India is third country that has largest number of internet users but when we talk about penetration means total population, in India only 12.6% of people are the internet user which decrease the rank of India on to 164th position based on survey

  • The hurdle problem in Cross Language Information Retrieval (CLIR) is poor performance when compared to monolingual IR performance because of query term mismatching, un-translated query words, multiple representations of query terms etc

Read more

Summary

INTRODUCTION

We are rapidly constructing the broad network architecture for transferring information across national barriers, but much remains to be done before linguistic boundaries can be better as effectively as geographic ones [1]. We know that entering query in another language to retrieve documents is very difficult to the user. The Internet environment gives the benefits for this issue by providing Cross Language Information Retrieval (CLIR) technology. CLIR filling the gap of linguistic barrier by allow a user to search in one language and retrieve documents in another language. If I enters query in Hindi language (like रामचǐर]मानस) than it gives more promising result as compared to the English query (like Ramcharitramaanas) because sometimes documents are completely in a single language(like Hindi) because of that user query based IR system cannot retrieve such documents. CLIR increases the percentage of users in internet because it provides the information in their native language. After the query translation in English-Hindi CLIR, if we get a Hindi meaning of such types of words definitely the performance of CLIR system will decrease because of mismatching between query terms and documents

RELATED WORK
PROPOSED METHODOLOGY
Query Translation
EXPERIMENTS
Performance graph
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call