Abstract

Abstract In standard NLP pipelines, morphological analysis and disambiguation (MA&D) precedes syntactic and semantic downstream tasks. However, for languages with complex and ambiguous word-internal structure, known as morphologically rich languages (MRLs), it has been hypothesized that syntactic context may be crucial for accurate MA&D, and vice versa. In this work we empirically confirm this hypothesis for Modern Hebrew, an MRL with complex morphology and severe word-level ambiguity, in a novel transition-based framework. Specifically, we propose a joint morphosyntactic transition-based framework which formally unifies two distinct transition systems, morphological and syntactic, into a single transition-based system with joint training and joint inference. We empirically show that MA&D results obtained in the joint settings outperform MA&D results obtained by the respective standalone components, and that end-to-end parsing results obtained by our joint system present a new state of the art for Hebrew dependency parsing.

Highlights

  • NLP research in recent years has shown increasing interest in parsing typologically different languages, as evident, for instance, by the universal dependencies1 initiative (Nivre et al, 2016)

  • We report F1 scores, both MD Full and MD POS for morphological disambiguation (MD), and both unlabeled and labeled F1 scores for the dependency trees (Dep)

  • We present a novel joint transition-based framework for morpho-syntactic parsing, designed to solve end-to-end dependency parsing in realistic scenarios

Read more

Summary

Introduction

NLP research in recent years has shown increasing interest in parsing typologically different languages, as evident, for instance, by the universal dependencies initiative (Nivre et al, 2016). Much attention is drawn to parsing morphologically rich languages (MRLs), which differ significantly from English in their structure and characteristics (Tsarfaty et al, 2010). In MRLs, grammatical information, typically expressed using word order in English, is often manifested in the internally complex structure of the words. In Modern Hebrew, for example, the inflected verb ‘‘ahbtih’’2 (loved + 1pers.singular.past + 3pers.feminine.singular) corresponds to three different grammatical functions: the subject ‘‘I,’’ the predicate ‘‘loved,’’ and the direct object ‘‘her.’’ Spanish damelo corresponds to a predicate, an indirect object, and a direct object, as in ‘‘give it to me.’’ in MRLs, morphological analysis (MA) which translates raw space-delimited tokens to syntactically relevant ‘‘word’’ units is a necessary condition for any syntactic or semantic downstream task. The Hebrew token ‘‘fmn,’’ for instance, may be read as the noun ‘‘oil,’’ the adjective ‘‘fat,’’ the verb ‘‘lubricated,’’ the sequence

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call