Abstract
Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system, information retrieval, machine translation and other researches. This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus. It will promote the knowledge sharing between different languages. In this paper, we propose a method to extract Tibetan-Chinese comparable corpus. The main work is in the following: (1) Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites. (2) Extraction method based on entity link from naturally annotated resources. Finally, the experimental results show our approach is effective.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have