Abstract

The abundance of multilingual content on internet other than English gives an urge to develop information retrieval system that can cross language boundaries. Such cross lingual information retrieval systems will bridge this language gap and allow user to ask a query in regional language and retrieve relevant documents in a different language. The problem of finding relevant document in language different from source language is the most challenging application of any cross lingual information retrieval. This paper discusses the development process of complete English to Hindi cross language information retrieval system along with the contribution of individual components to the system. The main focus of this paper is to discuss how optimization is done to our disambiguation approach, which we named as ‘Two level Disambiguation method’. The experimental results obtained affirm that the addition of a component ‘Analyzer’ to our CLIR architecture increases the efficiency of our proposed disambiguation algorithm.

Highlights

  • The English content on web has shrunk from 39 to 27% in last decade (Narasimha Raju and Bhadri Raju, 2015)

  • The increasing number of users on internet who desire to access information expressed in languages other than their own has established cross lingual information retrieval as a major issue in information retrieval

  • We propose an effective method for limiting the size of translation candidates set for query words for optimization of our proposed query translation and disambiguation model

Read more

Summary

Introduction

The English content on web has shrunk from 39 to 27% in last decade (Narasimha Raju and Bhadri Raju, 2015). On other side web content for languages like Chinese, Japanese, Hindi, Arabic etc. The increasing number of users on internet who desire to access information expressed in languages other than their own has established cross lingual information retrieval as a major issue in information retrieval. The retrieval is bilingual if one source language (e.g., English) and one document language (e.g., Hindi) is used. The multilingual retrieval system accepts user query in one language while outputs documents in multiple languages. Sometimes an intermediate language is used as a means of translation, thereby making process transitive (Gollins and Sanderson, 2001)

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call