Abstract

We present a detailed error analysis of a transition-based dependency parser trained on a Hindi dependency treebank. Parser error analysis has not been systematically examined from the point of view of treebanking before and this work intends to contribute in this area.
 We address two main questions in this paper:
 
 Can the parsing of certain structures be made easier by using alternative analyses for these structures?
 Are there certain linguistic cues implicit (or missing) in the current treebank that can be made explicit (or added) in order to make the parsing of complex constructions easier?
 
 These questions will guide us in examining the potential benefits of parser error analysis during treebanking. Through our experiments and analysis we were able to shed light on the causes of errors and subsequently have been able to improve the performance of the parser.

Highlights

  • Since the availability of Penn Treebank (Marcus et al, 1993), treebanks have played a crucial role in our attempt to build automatic natural language processing tools for various languages

  • The error analysis helps us formulate the questions that we address in this work

  • The results show that the lexical information for conjunctions in itself is sucient to disambiguate the coordination vs. subordination structures correctly and the added valency information seems to be redundant

Read more

Summary

Introduction

Since the availability of Penn Treebank (Marcus et al, 1993), treebanks have played a crucial role in our attempt to build automatic natural language processing tools for various languages. While the analysis of parser errors is used to improve parser performance (by discovering new learning features or re-designing parsing algorithms), it is rarely used to inform guidelines decisions. We carry out a detailed error analysis of a transitionbased dependency parser trained on a Hindi dependency treebank. The obvious benet of such an exercise is a potential improvement in parser accuracy More importantly, this can help the treebank developer in validating various guideline choices by reinforcing decisions that were correct and pointing towards possible revisions.

Hindi Dependency Treebank and Parsing
Experimental setup
Error Classication
Edge Type and Non-projective Edge
Edge Length
Edge Depth
Error Analysis
Intra-clausal errors Verbal complements and adjuncts
Inter-clausal errors
Experiments
TABLE Experiment I
Experiment I
TABLE Experiment II
Experiment II
Eect of Experiments I and II on parsing accuracy
Discussion
Findings
19 References

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.