Abstract

Subglottal resonances (SGRs) have recently been used in automatic speaker normalization (SN), leading to improvements in children’s speech recognition [Wang et al. (2009)]. It is hypothesized that human listeners use SGRs for SN as well. However, the suitability of SGRs for SN has not been adequately investigated. SGRs and formants from adult speakers of American English and Mexican Spanish were measured using a new speech corpus with simultaneous (subglottal) accelerometer recordings [Lulich et al. (2010)]. The corpus has been analyzed at a broad level to understand relations among SGRs, speaker height, native language and gender, and formant frequencies as well as the variation of SGRs across vowels and speakers. It is shown that SGRs are roughly constant for a given speaker, regardless of their native spoken language, but differ from speaker to speaker. SGRs are therefore well suited for use in SN and perhaps in speaker identification. Preliminary analyzes also show that SGRs are correlated with each other and can be used to develop/validate simple models of subglottal acoustics, robust algorithms for SGR estimation, and models of human SN. SGRs are also correlated with height, and there are gender-specific differences in SGR frequencies. [Work supported in part by the NSF.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call