Mutual predictability among speech sounds for talker adaptation and recognition

Eleanor Chodroff,Colin Wilson

doi:10.1121/1.5101193

Abstract

Processes of talker recognition and adaptation rely on a high degree of inter-talker phonetic variability and systematicity, respectively. While superficially in opposition, talker recognition in part depends on adaptation to the talker at hand. In this talk, we present evidence that talker variability is simultaneously extensive and structured within natural classes of speech sounds. In American English, talker mean peak frequencies for [s] span over 3000 Hz, but the variation in [s] is not independent of that in [z]: strong correlations of the talker mean peak frequency, among other phonetic dimensions, are observed between sibilant fricatives. Covariation among speech sounds indicates mutual predictability, such that evidence from one speech sound could be used to refine estimates or make predictions about a second. Listeners indeed demonstrate perceptual knowledge of covariation in generalized adaptation to novel talkers. After exposure to a talker with a relatively high- or low-peak frequency [z], listeners adjusted their [s]-[ʃ] boundary in accordance with the empirical covariation. As talker recognition entails estimation of a talker’s phonetic parameters, prior perceptual knowledge of covariation could be used to refine estimation of multiple speech sounds from minimal exposure, thus accelerating processes of talker adaptation and recognition.

Full Text