Abstract
Voice or speaker recognition is critical in a wide variety of social contexts. In this study, we investigated the contributions of acoustic, phonological, lexical, and semantic information toward voice recognition. Native English speaking participants were trained to recognize five speakers in five conditions: non-speech, Mandarin, German, pseudo-English, and English. We showed that voice recognition significantly improved as more information became available, from purely acoustic features in non-speech to additional phonological information varying in familiarity. Moreover, we found that the recognition performance is transferable between training and testing in phonologically familiar conditions (German, pseudo-English, and English), but not in unfamiliar (Mandarin) or non-speech conditions. These results provide evidence suggesting that bottom-up acoustic analysis and top-down influence from phonological processing collaboratively govern voice recognition.
Highlights
Such evidence suggests that both linguistic and paralinguistic characteristics underlie voice recognition, this is based mostly on studies that isolate only one class of characteristics
While it can be argued that word strings reduce prosody that may help participants distinguish between languages, we utilized word strings for all speech conditions to isolate the features of interest – acoustic, phonological, lexical, and semantic cues – rather than focusing on additional cues, such as prosody
While the previous work could have introduced performance bias because of partial overlaps between stimuli used in training and testing, the stimuli used for training completely differed from those used for testing in our study, such that we could examine generalization of voice recognition performance in a more unbiased fashion and assess the contributions of learned acoustic, phonological, lexical, and semantic cues towards recognizing voices in new stimuli
Summary
Such evidence suggests that both linguistic and paralinguistic characteristics underlie voice recognition, this is based mostly on studies that isolate only one class of characteristics. In an experimental design modeled closely on Perrachione’s (2007, 2011) studies of voice recognition, monolingual English speakers were trained to associate five voices with avatars in five conditions (non-speech, Mandarin, German, pseudo-English, and English) — rather than just two conditions as in previous work (Mandarin and English). Three factors distinguish this from Perrachione’s earlier design. If each type of information has a distinctive contribution, voice recognition performance should improve systematically as a function of the amount of information available: from acoustic features (non-speech) to the availability of unknown/unfamiliar phonological information (Mandarin) to increasingly familiar phonological content (from German to pseudo-English) to full lexical and semantic access (English). If lexical-semantic access further contributes to voice recognition in the current experimental context, we might see increased voice recognition progressing from German to pseudo-English, and to English; otherwise, we expected to see similar performance among these three conditions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.