Empirical studies on the impact of lexical resources on CLIR performance

Jinxi Xu,Ralph Weischedel

doi:10.1016/j.ipm.2004.06.009

Abstract

In this paper, we compile and review several experiments measuring cross-lingual information retrieval (CLIR) performance as a function of the following resources: bilingual term lists, parallel corpora, machine translation (MT), and stemmers. Our CLIR system uses a simple probabilistic language model; the studies used TREC test corpora over Chinese, Spanish and Arabic. Our findings include: • One can achieve an acceptable CLIR performance using only a bilingual term list (70–80% on Chinese and Arabic corpora). • However, if a bilingual term list and parallel corpora are available, CLIR performance can rival monolingual performance. • If no parallel corpus is available, pseudo-parallel texts produced by an MT system can partially overcome the lack of parallel text. • While stemming is useful normally, with a very large parallel corpus for Arabic–English, stemming hurt performance in our empirical studies with Arabic, a highly inflected language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Empirical studies on the impact of lexical resources on CLIR performance

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management

Lead the way for us

Journal: Information Processing and Management	Publication Date: Aug 20, 2004
Citations: 37

Similar Papers

Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment
Sun Jie
-
Sun JieSun Jie
01 Jan 2023
01 Jan 2023

An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus
Long H B Nguyen ... Phuoc Tran
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16
Long H B Nguyen, et. al.Long H B Nguyen ... Phuoc Tran
14 Oct 2016
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16

Translate or Transliterate? Modeling the Decision For English to Arabic Machine Translation
Mahmoud Azab
-
Mahmoud AzabMahmoud Azab
01 Jan 2013
01 Jan 2013

Studying machine translation technologies for large-data CLIR tasks: a patent prior-art search case study
Walid Magdy ... Gareth J F Jones
Information Retrieval | VOL. 17
Walid Magdy, et. al.Walid Magdy ... Gareth J F Jones
21 Nov 2013
Information Retrieval | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Empirical studies on the impact of lexical resources on CLIR performance

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management