Study on Tibetan-Chinese Comparable Corpus Extraction

Yuan Sun,Li-Li Guo

doi:10.12783/dtcse/aics2016/8212

Study on Tibetan-Chinese Comparable Corpus Extraction

Yuan Sun, Li-Li Guo

Open Access

https://doi.org/10.12783/dtcse/aics2016/8212

Copy DOI

Journal: DEStech Transactions on Computer Science and Engineering

Publication Date: Apr 30, 2017

#Cross Language Question Answering System #Entity Link + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system, information retrieval, machine translation and other researches. This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus. It will promote the knowledge sharing between different languages. In this paper, we propose a method to extract Tibetan-Chinese comparable corpus. The main work is in the following: (1) Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites. (2) Extraction method based on entity link from naturally annotated resources. Finally, the experimental results show our approach is effective.

Full Text