Abstract
Vowel formant data is traditionally normalized across speakers by transforming a set of ‘raw’ measurements into ‘standardized’ ones in one of two ways. With a speaker-extrinsic method, data from each individual is normalized with respect to external baseline measures calculated across the population of all speakers in a corpus, whereas a speaker-intrinsic method normalizes entirely with respect to speaker-dependent variables. The present study reports on implementations of both these methods in terms of hierarchical statistical models whereby probability distributions for various model parameters can be obtained using Bayesian analysis (rather than merely ‘converting’ the measurements). In this new framework, a speaker-extrinsic approach can estimate (1) the size and shape of each speaker’s vowel space, (2) the locations of vowel categories across a speech community within a normalized space, and (3) individual speakers’ deviations from the community norms. However, this process relies on a number of assumptions that are not needed with a speaker-intrinsic approach, which instead makes many low-level discrete ‘decisions’ on a speaker-by-speaker basis. By testing multiple models on the same dataset (a large corpus of vowel data collected from 132 speakers of American English), the present study explores the comparative merits of speaker-extrinsic and speaker-intrinsic Bayesian models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.