Abstract

AbstractIn Cross‐Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. Machine‐Readable Dictionaries (MRD) and Machine Translation (MT) systems are important resources for query translation in CLIR. We investigate the use of MT systems and MRD to Arabic–English and English–Arabic CLIR. The translation ambiguity associated with these resources is the key problem. We present three methods of query translation using a bilingual dictionary for Arabic–English CLIR. First, we present the Every‐Match (EM) method. This method yields ambiguous translations because many extraneous terms are added to the original query. To disambiguate query translation, we present the First‐Match (FM) method that considers the first match in the dictionary as the candidate term. Finally, we present the Two‐Phase (TP) method. We show that good retrieval effectiveness can be achieved without complex resources using the Two‐Phase method for Arabic–English CLIR. We also empirically evaluate the effectiveness of the Arabic–English MT approach using short, medium, and long queries of TREC7 and TREC9 topics and collections. The effects of the query length to the quality of the MT‐based CLIR are investigated. English–Arabic CLIR is evaluated via MRD and English–Arabic MT. The query expansion via posttranslation approach is used to deemphasize the extraneous terms introduced by the MRD and MT for English–Arabic CLIR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call