Abstract

Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system, information retrieval, machine translation and other researches. This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus. It will promote the knowledge sharing between different languages. In this paper, we propose a method to extract Tibetan-Chinese comparable corpus. The main work is in the following: (1) Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites. (2) Extraction method based on entity link from naturally annotated resources. Finally, the experimental results show our approach is effective.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call