Abstract

The increasing role of large speech databases in phonetic and psycholinguistic research has placed a premium on the efficiency and objectivity with which continuous acoustic signals are encoded in discrete parameters that may then be used in recognition models. One simple and common way speech has been encoded and compared in this work is by (1) converting the acoustic signal into a matrix of MFCCs, and (2) measuring the distance between two or more matrices following the application of a temporal alignment algorithm (e.g., dynamic time warping, cross-correlation) to account for differences in utterance rate. The specification of both the MFCCs and the alignment algorithm requires control over a wide array of variables, many of which have both theoretical and experimental implications, and yet most of these variables are set to default values specified in toolkits like the HTK and the Phonological CorpusTools software (Young et al., 2002; Hall et al., 2017). This project uses a large multispeaker database of nonword syllable production and perception (Woods et al., 2010) to test the performance of this acoustic distance procedure over the full parameter space, and evaluate the degree of accordance of such measurements with phonological theory and perceptual data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.