Abstract
An algorithm was developed for detecting glides (/w/, /j/, /r/, /l/, or /h/) in spoken English and detecting their place of articulation using an analysis of acoustic landmarks [Stevens 2002]. The system uses Gaussian mixture models (GMMs) trained on a subset of the TIMIT speech database annotated with acoustic landmarks. To characterize the glide tokens extracted from the speech samples, the following speech-related measurements were calculated: energy in four spectral bands (E1-E4), formant frequencies (F1-F4), and the time derivatives of E1-E4 (E1’-E4’); the fundamental frequency (F0) and magnitude difference of harmonics (H1-H2, H1-H4) were also included. GMMs were then trained on a subset of the tokens to learn the characteristics of each category for two distinct tasks: distinguishing glide landmarks from the set of all landmark types (identification task), and determining the place of articulation given a glide landmark (categorization task). The classifier used the maximum posterior probability of a speech sample conditioned on each of the trained GMMs. The performance of the algorithm was evaluated with median F-scores, and results suggest that the measurements at acoustic landmarks provide salient cues to glide detection and categorization.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.