Abstract

Treebank is one of important resources in the natural language processing. Compared with the rich and mature Chinese corpus, Vietnamese Syntactic Analysis is much more difficult. This paper presents a new approach which uses Chinese-Vietnamese bilingual word alignment corpus to build Vietnamese Dependency Treebank. Firstly, the aligned word processing was made by Chinese-Vietnamese sentence alignment; Secondly, the dependency parsing was done with Chinese sentences. Finally, Vietnamese Dependency Parsing Treebank was generated by Chinese-Vietnamese Languages align relationship and Chinese Dependency Tree, At the same time, The Vietnamese phrase tree converted into dependency Treebank can significantly improve the accuracy of dependency analysis. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, and it can save manpower and time to build the Vietnamese Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call