Abstract

An important question in phonology is to what degree the learner uses distributional information rather than substantive properties of speech sounds when learning phonological structure. This paper presents an algorithm that learns phonological classes from only distributional information: the contexts in which sounds occur. The input is a segmental corpus, and the output is a set of phonological classes. The algorithm is first tested on an artificial language, with both overlapping and nested classes reflected in the distribution, and retrieves the expected classes, performing well as distributional noise is added. It is then tested on four natural languages. It distinguishes between consonants and vowels in all cases, and finds more detailed, language-specific structure. These results improve on past approaches, and are encouraging, given the paucity of the input. More refined models may provide additional insight into which phonological classes are apparent from the distributions of sounds in natural languages.

Highlights

  • An area of interest in linguistics is how much of human language is innate and how much is learned from data (e.g. Chomsky 1988, Elman et al 1996, Pullum & Scholz 2002, Tomasello 2003)

  • This paper investigates the learning of phonological classes when only distributional information is available

  • Because it is not clear a priori what classes might be apparent in the distribution of a natural language, it is useful to begin with a case where the target classes are known in advance, a practice adopted in past work (Goldsmith & Xanthos 2009, Nazarov 2014, 2016)

Read more

Summary

Introduction

An area of interest in linguistics is how much of human language is innate and how much is learned from data (e.g. Chomsky 1988, Elman et al 1996, Pullum & Scholz 2002, Tomasello 2003). To what extent are phonological classes apparent in the distribution of sounds in a language, and to what extent do learners make use of this information?. This paper investigates the learning of phonological classes when only distributional information is available That is, it deals with the question of what classes are apparent, rather than what the learner might use. It deals with the question of what classes are apparent, rather than what the learner might use It does so by constructing an algorithm that attempts to learn solely from the contexts in which sounds occur, building on past work It is able to successfully distinguish consonants and vowels in every case, and retrieves interpretable classes within those categories for each language. §7 compares these results against past work, and §8 offers discussion and proposals for future research

Previous work
Parupa: an artificial language
Quantifying similarity: vector space models
A simple vector embedding of sounds
What do we count when we count sounds?
Normalised counts
Finding classes using Principal Component Analysis and k-means clustering
Principal Component Analysis
Visualising PPMI vector embeddings of Parupa segments
Recursively traversing the set of classes
Putting it all together
Simplifying assumptions
Running the algorithm on Parupa
Evaluating the robustness of the algorithm on Noisy Parupa
Testing the algorithm on real language data
Samoan
English
OI o æ E I aU
French
Finnish
Comparison with past work
Findings
Discussion and conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.