Abstract

In recent years, speech recognition systems have dramatically improved in performance through the development of general machine learning techniques. However, it is not always straightforward to interpret the mapping from the signal to the detected category. In the present work, we focus on the goal of transparency, specifying the processing steps that lead to robust modeling of vowel place using simple, descriptive Gaussian mixture models. We present a pre-processing and detection framework for vowel place, involving formant measurements, smoothing, and the GMM. This research aims to classify vowel place as a part of a larger speech recognition system. Vowel place was divided into 8 groups based on tongue advancement [Front/Back], height [High/Mid/Low], and root [Atr/Ctr]. Studies were performed using ∼700 vowel-consonant-vowel utterances from the LAFF VCV database.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call