Abstract

Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This article presents three statistical query translation models that focus on the resolution of query translation ambiguities. All the models assume that the selection of the translation of a query term depends on the translations of other terms in the query. They differ in the way linguistic structures are detected and exploited. The co-occurrence model treats a query as a bag of words and uses all the other terms in the query as the context for translation disambiguation. The other two models exploit linguistic dependencies among terms. The noun phrase (NP) translation model detects NPs in a query, and translates each NP as a unit by assuming that the translation of a term only depends on other terms within the same NP. Similarly, the dependency translation model detects and translates dependency triples, such as verb-object, as units. The evaluations show that linguistic structures always lead to more precise translations. The experiments of CLIR on TREC Chinese collections show that all three models have a positive impact on query translation and lead to significant improvements of CLIR performance over the simple dictionary-based translation method. The best results are obtained by combining the three models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.