Abstract

Models of voice perception propose that identities are encoded relative to an abstracted average or prototype. While there is some evidence for norm-based coding when learning to discriminate different voices, little is known about how the representation of an individual's voice identity is formed through variable exposure to that voice. In two experiments, we show evidence that participants form abstracted representations of individual voice identities based on averages, despite having never been exposed to these averages during learning. We created 3 perceptually distinct voice identities, fully controlling their within-person variability. Listeners first learned to recognise these identities based on ring-shaped distributions located around the perimeter of within-person voice spaces – crucially, these distributions were missing their centres. At test, listeners’ accuracy for old/new judgements was higher for stimuli located on an untrained distribution nested around the centre of each ring-shaped distribution compared to stimuli on the trained ring-shaped distribution.

Highlights

  • Models of voice perception propose that identities are encoded relative to an abstracted average or prototype

  • The voice space was defined by variation in glottal pulse rate (GPR) in one dimension, and variation in apparent vocal tract length (VTL) in the other dimension (Fig. 1)

  • We further explored whether there was a relationship between accuracy and acoustic distance to the centre of each distractor identity’s voice space in Experiment 1, mirroring the generalised linear mixed model (GLMM) specified for the learned identities

Read more

Summary

Introduction

Models of voice perception propose that identities are encoded relative to an abstracted average or prototype. While norm-based coding and the extraction of summary statistics are at the centre of many models of identity perception, only a relatively small number of empirical studies have provided direct evidence of such a mechanism for faces and voices. In these studies, participants were presented with the voices or faces of familiar (or familiarised) identities: some of the stimuli were unmanipulated voice recordings/images of faces, while others were averages derived from different numbers of original stimuli. No study has directly probed whether summary statistics are extracted for voice identities

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call