Abstract
Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more different languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR approaches establishing translingual associations. The results show that using bilingual corpora for automated extraction of term equivalences in context outperforms dictionarybased methods. Translingual versions of the Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) perform well, as does translingual pseudo-relevance feedback (PRF) and Example-Based Term-in-context translation (EBT). All showed relatively small performance loss between monolingual and translingual versions, ranging between 87–101% of monolingual IR performance. Query translation based on a general machine-readable bilingual dictionary—heretofore the most popular method—did not match the performance of other, more sophisticated methods. Also, the previous very high LSI results in the literature based on “mate-finding” were superseded by more realistic relevance-based evaluations; LSI performance proved comparable to that of other statistical corpus-based methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.