Abstract

In this paper, we present a new dependency parsing method for languages which have very small annotated corpus and for which methods of segmentation and morphological analysis producing a unique (automatically disambiguated) result are very unreliable. Our method works on a morphosyntactic lattice factorizing all possible segmentation and part-of-speech tagging results. The quality of the input to syntactic analysis is hence much better than that of an unreliable unique sequence of lemmatized and tagged words. We propose an adaptation of Eisner's algorithm for finding the k-best dependency trees in a morphosyntactic lattice structure encoding multiple results of morphosyntactic analysis. Moreover, we present how to use Dependency Insertion Grammar in order to adjust the scores and filter out invalid trees, the use of language model to rescore the parse trees and the k-best extension of our parsing model. The highest parsing accuracy reported in this paper is 74.32% which represents a 6.31% improvement compared to the model taking the input from the unreliable morphosyntactic analysis tools.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call