Abstract
On the one hand, talker variability is one of the fundamental challenges for speech recognition: each talker has their own mapping from linguistic units to sounds, which means that an effective listener must use a different recognition function for each talker. On the other hand, talker variability means that speech is a source of rich information about who the talker is. This dual nature of talker variability means that speech and talker recognition are inextricably linked: knowing something about who is talking makes it easier to understand what they are saying, and knowing something about how someone talks unlocks the rich social meaning of speech. I argue that the concept of a talker's generative model, or the probabilistic distributions of sounds associated with each phonetic/linguistic category, is a useful general purpose conceptual tool for understanding the link between talker variability, speech recognition, and social identity. With such phonetic cue distributions, we can use information theoretic tools to quantify both the extent and structure of talker variability across different phonetic systems and establish in-principle consequences of talker variability for both speech recognition and socio-indexical inferences from speech.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.