Abstract

We introduce a simple and highly general phonotactic learner which induces a probabilistic finite-state automaton from word-form data. We describe the learner and show how to parameterize it to induce unrestricted regular languages, as well as how to restrict it to certain subregular classes such as Strictly k-Local and Strictly k-Piecewise languages. We evaluate the learner on its ability to learn phonotactic constraints in toy examples and in datasets of Quechua and Navajo. We find that an unrestricted learner is the most accurate overall when modeling attested forms not seen in training; however, only the learner restricted to the Strictly Piecewise language class successfully captures certain nonlocal phonotactic constraints. Our learner serves as a baseline for more sophisticated methods.

Highlights

  • Natural language phonotactics is argued to fall in the class of regular languages, or even in a smaller class of subregular languages (Rogers et al, 2013)

  • We find that an unrestricted probabilistic finite-state automaton (PFA) learner performs most accurately when predicting real held-out forms, while an SP learner is most effective in learning certain nonlocal constraints

  • We introduced a framework for phonotactic learning based on simple induction of probabilistic finitestate automata by stochastic gradient descent

Read more

Summary

Introduction

Natural language phonotactics is argued to fall in the class of regular languages, or even in a smaller class of subregular languages (Rogers et al, 2013) This observation has motivated the study of probabilistic finite-state automata (PFAs) that generate these languages as models of phonotactics. We implement a simple method for the induction of PFAs for phonotactics from data, which can induce general regular languages in addition to languages in certain more restricted subclasses, for example, Strictly k-Local and Strictly k-Piecewise languages (Heinz, 2018; Heinz and Rogers, 2010). We evaluate our learner on corpus data from Quechua and Navajo, with a particular emphasis on the ability to learn nonlocal constraints We make both theoretical and empirical contributions. We demonstrate how Strictly Local and Strictly Piecewise constraints can be encoded within our framework, and show how informationtheoretic regularization can be applied to produce deterministic automata

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call