Abstract
Due to the globalization on the Web, monolingual Text Categorization can be reformulated as a cross language TC task. To establish a practical English-Chinese CLTC system, a feature translation method and a fast text categorization algorithm based on a novel Category Vector Space Model are proposed in this paper. Provided a Chinese-English bilingual dictionary in scientific and technological fields, parallel corpora was employed to append translation probability value to bilingual dictionary so as to disambiguate translation results. The experiment results show that the CLTC system which was established by method in the paper is practical and valuable. The performance of Cross-Language text categorization system exceeds that of Mono-Lingual text categorization system and the result is exciting.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have