Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

Manaal Faruqui,Ryan Mcdonald,Radu Soricut

doi:10.1162/tacl_a_00079

Abstract

Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lexicons from small seed sets. Our method is language-independent, and we show that we can expand a 1000 word seed lexicon to more than 100 times its size with high quality for 11 languages. In addition, the automatically created lexicons provide features that improve performance in two downstream tasks: morphological tagging and dependency parsing.

Highlights

Morpho-syntactic lexicons contain information about the morphological attributes and syntactic roles of words in a given language
As these lexicons contain rich linguistic information, they are useful as features in downstream NLP tasks like machine translation (Nießen and Ney, 2004; Minkov et al, 2007; Green and DeNero, 2012), part of speech tagging (Schmid, 1994; Denis and Sagot, 2009; Moore, 2015), dependency parsing (Goldberg et al, 2009), language modeling (Arisoy et al, 2010) and morphological tagging (Muller and Schuetze, 2015) inter alia
We present a method that takes as input a small seed lexicon, containing a few thousand annotated words, and outputs an automatically constructed lexicon which contains morpho-syntactic attributes for a large number of words of a given language

Summary

Introduction

Morpho-syntactic lexicons contain information about the morphological attributes and syntactic roles of words in a given language. As these lexicons contain rich linguistic information, they are useful as features in downstream NLP tasks like machine translation (Nießen and Ney, 2004; Minkov et al, 2007; Green and DeNero, 2012), part of speech tagging (Schmid, 1994; Denis and Sagot, 2009; Moore, 2015), dependency parsing (Goldberg et al, 2009), language modeling (Arisoy et al, 2010) and morphological tagging (Muller and Schuetze, 2015) inter alia. We perform intrinsic evaluation of the quality of generated lexicons obtained from either the universal dependency treebank or created manually by humans (§4) We show that these automatically created lexicons provide useful features in two extrinsic NLP tasks which require identifying the contextually plausible morphological and syntactic roles: morphological tagging (Hajicand Hladka, 1998; Hajic, 2000) and syntactic dependency parsing (Kubler et al, 2009). We anticipate that the lexicons created will be useful in a variety of NLP problems

Graph Construction

Graph-based Label Propagation

Model Estimation

Label Propagation

Paradigm Projection

Dependency Treebank Lexicons

Manually Curated Lexicons

Morphological Tagging

Dependency Parsing

Further Analysis

Related Work

Future Work

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2016
Citations: 89	License type: cc-by

R Discovery Prime

R Discovery Prime

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Instance selection method for improving graph-based semi-supervised learning
Hai Wang ... Shao-Bo Wang
Frontiers of Computer Science | VOL. 12
Hai Wang, et. al.Hai Wang ... Shao-Bo Wang
13 Feb 2018
Frontiers of Computer Science | VOL. 12

Graph-based semi-supervised learning: A review
Yanwen Chong ... Shaoming Pan
Neurocomputing | VOL. 408
Yanwen Chong, et. al.Yanwen Chong ... Shaoming Pan
08 May 2020
Neurocomputing | VOL. 408

Image/Video Semantic Analysis by Semi-Supervised Learning
Jinhui Tang ... Meng Wang
-
Jinhui Tang, et. al.Jinhui Tang ... Meng Wang
01 Jan 2009
01 Jan 2009

Semi-Supervised Learning with Density-Sensitive Manifold graph
Zheng Wang ... Yao Zhao
-
Zheng Wang, et. al.Zheng Wang ... Yao Zhao
01 Oct 2010
01 Oct 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics