An algorithm for learning phonological classes from distributional similarity

Connor Mayer

doi:10.1017/s0952675720000056

Abstract

An important question in phonology is to what degree the learner uses distributional information rather than substantive properties of speech sounds when learning phonological structure. This paper presents an algorithm that learns phonological classes from only distributional information: the contexts in which sounds occur. The input is a segmental corpus, and the output is a set of phonological classes. The algorithm is first tested on an artificial language, with both overlapping and nested classes reflected in the distribution, and retrieves the expected classes, performing well as distributional noise is added. It is then tested on four natural languages. It distinguishes between consonants and vowels in all cases, and finds more detailed, language-specific structure. These results improve on past approaches, and are encouraging, given the paucity of the input. More refined models may provide additional insight into which phonological classes are apparent from the distributions of sounds in natural languages.

Highlights

An area of interest in linguistics is how much of human language is innate and how much is learned from data (e.g. Chomsky 1988, Elman et al 1996, Pullum & Scholz 2002, Tomasello 2003)
This paper investigates the learning of phonological classes when only distributional information is available
Because it is not clear a priori what classes might be apparent in the distribution of a natural language, it is useful to begin with a case where the target classes are known in advance, a practice adopted in past work (Goldsmith & Xanthos 2009, Nazarov 2014, 2016)

Summary

Introduction

An area of interest in linguistics is how much of human language is innate and how much is learned from data (e.g. Chomsky 1988, Elman et al 1996, Pullum & Scholz 2002, Tomasello 2003). To what extent are phonological classes apparent in the distribution of sounds in a language, and to what extent do learners make use of this information?. This paper investigates the learning of phonological classes when only distributional information is available That is, it deals with the question of what classes are apparent, rather than what the learner might use. It deals with the question of what classes are apparent, rather than what the learner might use It does so by constructing an algorithm that attempts to learn solely from the contexts in which sounds occur, building on past work It is able to successfully distinguish consonants and vowels in every case, and retrieves interpretable classes within those categories for each language. §7 compares these results against past work, and §8 offers discussion and proposals for future research

Previous work

Parupa: an artificial language

Quantifying similarity: vector space models

A simple vector embedding of sounds

What do we count when we count sounds?

Normalised counts

Finding classes using Principal Component Analysis and k-means clustering

Principal Component Analysis

Visualising PPMI vector embeddings of Parupa segments

Recursively traversing the set of classes

Putting it all together

Simplifying assumptions

Running the algorithm on Parupa

Evaluating the robustness of the algorithm on Noisy Parupa

Testing the algorithm on real language data

Samoan

English

OI o æ E I aU

French

Finnish

Comparison with past work

Findings

Discussion and conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Phonology	Publication Date: Feb 1, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An algorithm for learning phonological classes from distributional similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Phonology

Lead the way for us

Similar Papers

Articles in Natural Languages and Artificial Languages
Sunyoung Park ... Jin-Young Tak
Journal of Universal Language | VOL. 18
Sunyoung Park, et. al.Sunyoung Park ... Jin-Young Tak
01 Mar 2017
Journal of Universal Language | VOL. 18

Language Learning Research at the Intersection of Experimental, Computational, and Corpus‐Based Approaches
Patrick Rebuschat ... Detmar Meurers
Language Learning | VOL. 67
Patrick Rebuschat, et. al.Patrick Rebuschat ... Detmar Meurers
01 Jun 2017
Language Learning | VOL. 67

Questions in Natural and Artificial Languages
Christo Moskovsky ... Alan Libert
Journal of Universal Language | VOL. 7
Christo Moskovsky, et. al.Christo Moskovsky ... Alan Libert
30 Sep 2006
Journal of Universal Language | VOL. 7

The Pragmatics of Indirect Discourse in Artificial Languages
Alan Reed Libert
-
Alan Reed LibertAlan Reed Libert
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An algorithm for learning phonological classes from distributional similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Phonology