Corpus‐based cross‐language information retrieval in retrieval of highly relevant documents

Tuomas Talvensaari,Kalervo Järvelin,Jorma Laurikkala,Martti Juhola

doi:10.1002/asi.20495

Abstract

AbstractInformation retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus‐based cross‐language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish–Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels—liberal, regular, and stringent—were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionary‐based query translation program; the two translation methods were also combined. The results indicate that corpus‐based CLIR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Corpus‐based cross‐language information retrieval in retrieval of highly relevant documents

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science and Technology

Lead the way for us

Journal: Journal of the American Society for Information Science and Technology	Publication Date: Dec 28, 2006
Citations: 17

Similar Papers

Test Collections and Evaluation Metrics Based on Graded Relevance
Kalervo Järvelin
-
Kalervo JärvelinKalervo Järvelin
01 Jan 2013
01 Jan 2013

Query translation in Chinese-English cross-language information retrieval
Yibo Zhang ... Le Sun
-
Yibo Zhang, et. al.Yibo Zhang ... Le Sun
01 Jan 1999
01 Jan 1999

Experiments with query translation and re-ranking methods in Vietnamese-English bilingual information retrieval
Lam Tung Giang ... Vo Trung Hung
-
Lam Tung Giang, et. al.Lam Tung Giang ... Vo Trung Hung
01 Jan 2013
01 Jan 2013

Adapting google translate for English-Persian cross-lingual information retrieval in medical domain
Amin Rahmani
-
Amin RahmaniAmin Rahmani
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Corpus‐based cross‐language information retrieval in retrieval of highly relevant documents

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science and Technology