Predicting the perception of words from the perception of phonemes

Terrance M Nearey

doi:10.1121/1.420878

Abstract

The present paper explores the degree to which the perception of CVC syllables can be factored into the perception of its constituent phonemes. Categorization experiments with synthetic speech typically manifest such factorability to a remarkable degree (regardless of lexical status of the syllables.) Such factorability is also compatible with intelligibility studies of English CVCs [ e.g., A. Boothroyd and S. Nittrouer, J. Acoust. Soc. Am. 84, 101–114 (1988)]. These show that the identification of nonsense syllables can be predicted as the product of the probabilities of identification of its phoneme parts, while word identification is systematically higher. Simulation studies are reported here involving a ‘‘factored’’ perceptual model that consists of a set of phoneme-likelihood estimators whose outputs are modulated by prior probabilities related to lexical status. This model can approximate the patterns observed in human perception. Simulations were also run with nonfactorable models, where syllables and words involve information about unique stimulus properties that cannot be predicted from their constituent phonemes. Consistent with a conjecture of Allen [J. Allen, IEEE Trans. Speech Audio Process. 2, 567–577 (1994)], such syllable template models do not produce behavior compatible with human performance. [Work supported by SSHRC.]

Full Text