Abstract

There is size information in speech sounds because the vocal tract and the vocal cords both grow as a child develops into an adult. Specifically, average pitch and mean formant frequency decrease as speaker size increases. Nevertheless, human speech recognition is effectively size invariant across the full range of sizes in the normal population of speakers and well beyond. It is also the case that listeners can discriminate speaker size with great accuracy; indeed, with greater accurately than they can discriminate the loudness of sound or the brightness of light. The first part of this talk describes how the peripheral auditory system normalizes speech sounds automatically to produce a size invariant representation for speech recognition. The second part presents a model of how the central auditory system transforms information in the cochlea into our perception of who is speaking and what they are saying. The model suggests that the system combines information about vocal resonator size with a small amount of contextual information to determine what the person is saying (at the phonological level), and then it adds voice pitch information to determine who is speaking (in the sense of the sex and size of the speaker).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.