Effects of Training on the Acoustic–Phonetic Representation of Synthetic Speech

Alexander L Francis,Kimberly Fenn,Howard C Nusbaum

doi:10.1044/1092-4388(2007/100)

Abstract

Investigate training-related changes in acoustic-phonetic representation of consonants produced by a text-to-speech (TTS) computer speech synthesizer. Forty-eight adult listeners were trained to better recognize words produced by a TTS system. Nine additional untrained participants served as controls. Before and after training, participants were tested on consonant recognition and made pairwise judgments of consonant dissimilarity for subsequent multidimensional scaling (MDS) analysis. Word recognition training significantly improved performance on consonant identification, although listeners never received specific training on phoneme recognition. Data from 31 participants showing clear evidence of learning (improvement>or=10 percentage points) were further investigated using MDS and analysis of confusion matrices. Results show that training altered listeners' treatment of particular acoustic cues, resulting in both increased within-class similarity and between-class distinctiveness. Some changes were consistent with current models of perceptual learning, but others were not. Training caused listeners to interpret the acoustic properties of synthetic speech more like those of natural speech, in a manner consistent with a flexible-feature model of perceptual learning. Further research is necessary to refine these conclusions and to investigate their applicability to other training-related changes in intelligibility (e.g., associated with learning to better understand dysarthric speech or foreign accents).

Full Text