Abstract
This paper presents the development of a syntactic component for the Vietnamese language. We first discuss the construction of a lexicalized tree-adjoining grammar using an automatic extraction approach. We then present the construction and evaluation of a deep syntactic parser based on the extracted grammar. This is a complete system integrating necessary tools to process Vietnamese text, which permits to take as input raw texts and produce syntactic structures. A dependency annotation scheme for Vietnamese and an algorithm for extracting dependency structures from derivation trees are also proposed. At present, this is the first Vietnamese parsing system capable of producing both constituency and dependency analyses with encouraging performances: 69.33% and 73.21% for constituency and dependency analysis accuracy, respectively. The parser also compares favourably to a statistical parser which is trained and tested on the same data sets.
Highlights
57.14 70.00 mar extracted from the training corpus does not contain the syntactic structure of a given sentence to be parsed
We evaluated a syntactic analysis system based on LTAG for Vietnamese
The most complete report on parser performance for Vietnamese is an empirical study of applying probabilistic CFG parsing models by Collins (2003); its best result on constituency analysis is 78% T -accuracy on a test corpus, while there is no result reported for dependency analysis
Summary
ConjGroups( ) returns k groups of components i of which are separated by k − 1 conjunctions c1, . . . , ck−1, which have a special POS tag in the treebank (CC). The functions ArgNodes(H, ) and ModNodes(H, ) each return a list of nodes which are arguments and modifiers, respectively, of a node H. The derived tree of the sentence once processed by Algorithm 2 is shown, wherein the inserted nodes are marked by the quotation mark symbol (’). At this step, each derived tree is decomposed into a set of elementary trees. The result of the extraction process is three sets of elementary trees: contains spine trees, contains modifier trees and contains conjunction trees
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.