A Generative Model of Phonotactics

Richard Futrell,Adam Albright,Timothy J O’Donnell,Peter Graff

doi:10.1162/tacl_a_00047

Abstract

We present a probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics (Hayes and Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, modeling a process where forms are built up out of subparts by phonologically-informed structure building operations. We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2007; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology (Clements, 1985; Dresher, 2009). Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models’ ability to capture phonotactic distributions in the lexicons of 14 languages drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages. We also present novel analyses that probe model behavior in more detail.

Highlights

People have systematic intuitions about which sequences of sounds would constitute likely or unlikely words in their language: blick is not an English word, it sounds like it could be, while bnick does not (Chomsky and Halle, 1965)
It is widely accepted that phonotactic judgments may be gradient: the nonsense word blick is better as a hypothetical English word than bwick, which is better than bnick (Hayes and Wilson, 2008; Albright, 2009; Daland et al, 2011)
To evaluate the contribution of feature dependency graphs, we compare our models with a baseline N-gram model, which represents phonemes as atomic units

Summary

Introduction

People have systematic intuitions about which sequences of sounds would constitute likely or unlikely words in their language: blick is not an English word, it sounds like it could be, while bnick does not (Chomsky and Halle, 1965) Such intuitions reveal that speakers are aware of the restrictions on sound sequences which can make up possible morphemes in their language—the phonotactics of the language. It is widely accepted that phonotactic judgments may be gradient: the nonsense word blick is better as a hypothetical English word than bwick, which is better than bnick (Hayes and Wilson, 2008; Albright, 2009; Daland et al, 2011). Inspired by optimality-theoretic approaches to phonology, the most linguistically informed and successful such models have been constraint-based— formulating the problem of phonotactic generalization in terms of restrictions that penalize illicit combinations of sounds (e.g., ruling out ∗bn-)

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2017
Citations: 48	License type: cc-by

R Discovery Prime

R Discovery Prime

A Generative Model of Phonotactics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Modeling of manufacturing feature interactions for automated process planning
Dusan N Sormaz ... Behrokh Khoshnevis
Journal of Manufacturing Systems | VOL. 19
Dusan N Sormaz, et. al.Dusan N Sormaz ... Behrokh Khoshnevis
01 Jan 1999
Journal of Manufacturing Systems | VOL. 19

The generative approach

-

29 Mar 2012
29 Mar 2012

Causal Reasoning in Multi-Object Interaction on the Traffic Scene: Occlusion-Aware Prediction of Visibility Fluent
Xuanpeng Li ... Dong Wang
IEEE Access | VOL. 8
Xuanpeng Li, et. al.Xuanpeng Li ... Dong Wang
01 Jan 2020
IEEE Access | VOL. 8

The Hierarchical And-Or Graph Based Visibility Reasoning on Road Scenes
Qifan Xue ... Leixin Zheng
-
Qifan Xue, et. al.Qifan Xue ... Leixin Zheng
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Generative Model of Phonotactics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics