Abstract

Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lexicons from small seed sets. Our method is language-independent, and we show that we can expand a 1000 word seed lexicon to more than 100 times its size with high quality for 11 languages. In addition, the automatically created lexicons provide features that improve performance in two downstream tasks: morphological tagging and dependency parsing.

Highlights

  • Morpho-syntactic lexicons contain information about the morphological attributes and syntactic roles of words in a given language

  • As these lexicons contain rich linguistic information, they are useful as features in downstream NLP tasks like machine translation (Nießen and Ney, 2004; Minkov et al, 2007; Green and DeNero, 2012), part of speech tagging (Schmid, 1994; Denis and Sagot, 2009; Moore, 2015), dependency parsing (Goldberg et al, 2009), language modeling (Arisoy et al, 2010) and morphological tagging (Muller and Schuetze, 2015) inter alia

  • We present a method that takes as input a small seed lexicon, containing a few thousand annotated words, and outputs an automatically constructed lexicon which contains morpho-syntactic attributes for a large number of words of a given language

Read more

Summary

Introduction

Morpho-syntactic lexicons contain information about the morphological attributes and syntactic roles of words in a given language. As these lexicons contain rich linguistic information, they are useful as features in downstream NLP tasks like machine translation (Nießen and Ney, 2004; Minkov et al, 2007; Green and DeNero, 2012), part of speech tagging (Schmid, 1994; Denis and Sagot, 2009; Moore, 2015), dependency parsing (Goldberg et al, 2009), language modeling (Arisoy et al, 2010) and morphological tagging (Muller and Schuetze, 2015) inter alia. We perform intrinsic evaluation of the quality of generated lexicons obtained from either the universal dependency treebank or created manually by humans (§4) We show that these automatically created lexicons provide useful features in two extrinsic NLP tasks which require identifying the contextually plausible morphological and syntactic roles: morphological tagging (Hajicand Hladka, 1998; Hajic, 2000) and syntactic dependency parsing (Kubler et al, 2009). We anticipate that the lexicons created will be useful in a variety of NLP problems

Graph Construction
Graph-based Label Propagation
Model Estimation
Label Propagation
Paradigm Projection
Dependency Treebank Lexicons
Manually Curated Lexicons
Morphological Tagging
Dependency Parsing
Further Analysis
Related Work
Future Work
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call