Abstract
In this paper, we present a new dependency parsing method for languages which have very small annotated corpus and for which methods of segmentation and morphological analysis producing a unique (automatically disambiguated) result are very unreliable. Our method works on a morphosyntactic lattice factorizing all possible segmentation and part-of-speech tagging results. The quality of the input to syntactic analysis is hence much better than that of an unreliable unique sequence of lemmatized and tagged words. We propose an adaptation of Eisner's algorithm for finding the k-best dependency trees in a morphosyntactic lattice structure encoding multiple results of morphosyntactic analysis. Moreover, we present how to use Dependency Insertion Grammar in order to adjust the scores and filter out invalid trees, the use of language model to rescore the parse trees and the k-best extension of our parsing model. The highest parsing accuracy reported in this paper is 74.32% which represents a 6.31% improvement compared to the model taking the input from the unreliable morphosyntactic analysis tools.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.