Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.

Usha Goswami,Victoria Leong

doi:10.1371/journal.pone.0144411

Abstract

When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages.

Highlights

A necessary step in language acquisition is the development of a phonological system, implicit knowledge about the inventory of the sound system of a language
In a preliminary spectral Principle Component Analysis (PCA) analysis, we found that the loading patterns we obtained were broadly similar whether we used all datapoints, or only a subset of datapoints that were separated by 250 ms or by 500 ms
By analysing the patterns of component loading across the channels in a high-dimensional representation, we reasoned that one should be able to identify groups of adjacent channels that belong to the same core band of spectral or temporal modulation

Summary

Introduction

A necessary step in language acquisition is the development of a phonological system, implicit knowledge about the inventory of the sound system of a language. By 2–3 years of age, children can count the number of syllables in a word, and say whether two words rhyme [2] These are deceptively complex feats of speech engineering, in principle requiring the child to utilise the overt acoustic spectro-temporal structure of the speech signal in order to discover the latent phonological building blocks of the English language. These building blocks can be viewed as a nested hierarchy of prosodic stress patterns, syllables and onset-rime units

Objectives

Methods

Results

Discussion

Conclusion