Abstract

We show how to predict the basic word-order facts of a novel language given only a corpus of part-of-speech (POS) sequences. We predict how often direct objects follow their verbs, how often adjectives follow their nouns, and in general the directionalities of all dependency relations. Such typological properties could be helpful in grammar induction. While such a problem is usually regarded as unsupervised learning, our innovation is to treat it as supervised learning, using a large collection of realistic synthetic languages as training data. The supervised learner must identify surface features of a language’s POS sequence (hand-engineered or neural features) that correlate with the language’s deeper structure (latent trees). In the experiment, we show: 1) Given a small set of real languages, it helps to add many synthetic languages to the training data. 2) Our system is robust even when the POS sequences include noise. 3) Our system on this task outperforms a grammar induction baseline by a large margin.

Highlights

  • Descriptive linguists often characterize a human language by its typological properties

  • The dobj relation points from a verb to its direct object, so a directionality of 0.9—meaning that 90% of dobj dependencies are right-directed—indicates a dominant verb-object order. (See Table 1 for more such examples.) Our system is trained to predict the relative frequency of rightward dependencies for each of 57 dependency types from the Universal Dependencies project (UD)

  • We assume that all languages draw on the same set of POS tags and dependency relations that is proposed by the UD project, so that our predictor works across languages

Read more

Summary

Introduction

Descriptive linguists often characterize a human language by its typological properties. The problem is challenging because the language’s true word order statistics are computed from syntax trees, whereas our method has access only to a POS-tagged corpus. Based on these POS sequences alone, we predict the directionality of each type of dependency relation. The dobj relation points from a verb to its direct object (if any), so a directionality of 0.9—meaning that 90% of dobj dependencies are right-directed—indicates a dominant verb-object order. We assume that all languages draw on the same set of POS tags and dependency relations that is proposed by the UD project (see §3), so that our predictor works across languages

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.