Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning

Dingquan Wang,Jason Eisner

doi:10.1162/tacl_a_00052

Dingquan Wang, Jason Eisner

Open Access

https://doi.org/10.1162/tacl_a_00052

Copy DOI

Abstract

We show how to predict the basic word-order facts of a novel language given only a corpus of part-of-speech (POS) sequences. We predict how often direct objects follow their verbs, how often adjectives follow their nouns, and in general the directionalities of all dependency relations. Such typological properties could be helpful in grammar induction. While such a problem is usually regarded as unsupervised learning, our innovation is to treat it as supervised learning, using a large collection of realistic synthetic languages as training data. The supervised learner must identify surface features of a language’s POS sequence (hand-engineered or neural features) that correlate with the language’s deeper structure (latent trees). In the experiment, we show: 1) Given a small set of real languages, it helps to add many synthetic languages to the training data. 2) Our system is robust even when the POS sequences include noise. 3) Our system on this task outperforms a grammar induction baseline by a large margin.

Highlights

Descriptive linguists often characterize a human language by its typological properties
The dobj relation points from a verb to its direct object, so a directionality of 0.9—meaning that 90% of dobj dependencies are right-directed—indicates a dominant verb-object order. (See Table 1 for more such examples.) Our system is trained to predict the relative frequency of rightward dependencies for each of 57 dependency types from the Universal Dependencies project (UD)
We assume that all languages draw on the same set of POS tags and dependency relations that is proposed by the UD project, so that our predictor works across languages

Summary

Introduction

Descriptive linguists often characterize a human language by its typological properties. The problem is challenging because the language’s true word order statistics are computed from syntax trees, whereas our method has access only to a POS-tagged corpus. Based on these POS sequences alone, we predict the directionality of each type of dependency relation. The dobj relation points from a verb to its direct object (if any), so a directionality of 0.9—meaning that 90% of dobj dependencies are right-directed—indicates a dominant verb-object order. We assume that all languages draw on the same set of POS tags and dependency relations that is proposed by the UD project (see §3), so that our predictor works across languages

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2017
Citations: 35	License type: cc-by

R Discovery Prime

R Discovery Prime

Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Recursive Top-Down Production for Sentence Generation with Latent Trees
Shawn Tan ... Alessandro Sordoni
-
Shawn Tan, et. al.Shawn Tan ... Alessandro Sordoni
01 Jan 2020
01 Jan 2020

Unsupervised feature learning from finite data by message passing: Discontinuous versus continuous phase transition.
Haiping Huang ... Taro Toyoizumi
Physical review. E | VOL. 94
Haiping Huang, et. al.Haiping Huang ... Taro Toyoizumi
21 Dec 2016
Physical review. E | VOL. 94

The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages
Dingquan Wang ... Jason Eisner
Transactions of the Association for Computational Linguistics | VOL. 4
Dingquan Wang, et. al.Dingquan Wang ... Jason Eisner
01 Dec 2016
Transactions of the Association for Computational Linguistics | VOL. 4

InaNLP: Indonesia natural language processing toolkit, case study: Complaint tweet classification
Ayu Purwarianti ... Alfan Farizki Wicaksono
-
Ayu Purwarianti, et. al.Ayu Purwarianti ... Alfan Farizki Wicaksono
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics