Abstract

We report on Korean monolingual, Chinese-Korean English-as-pivot bilingual, and Chinese-English bilingual CLIR experiments using MT software augmented with Web-based entity-oriented translation as resources in the NTCIR-4 environment. Simple stemming is helpful in improving bigram indexing for Korean retrieval. For word indexing, keeping nouns only is preferable. Web-based translation reduces untranslated terms left over after MT and substantially improves CLIR results. Translation concatenation is found to consistently improve CLIR effectiveness, while combining a retrieval list from bigram and word indexing is also helpful. A method to disambiguate multiple MT outputs using a log likelihood ratio threshold was tested. Depending on the nature of the title or description queries, bigram only or a retrieval combination, or relaxed or rigid evaluations, direct bilingual CLIR returned an average precision of 71--79% (English-Korean) and 76--84% (Chinese-English) of the corresponding Korean-Korean and English-English monolingual results. Using English as a pivot in Chinese-Korean CLIR provides about 55--65% the effectiveness that Korean alone does. Entity/terminology translation at the pivot language stage accounts for a large portion of this deficiency. A topic with comparatively worse Chinese-English bilingual result does not necessarily mean that it will continue to under-perform (after further transitive Korean translation) at the Korean retrieval level.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.