A Gaussian-mixture-model-based approach to classifying vowel place in speech signals

Ritik Patnaik,Stefanie Shattuck-Hufnagel,Jeung-Yoon Choi

doi:10.1121/10.0008274

Abstract

In recent years, speech recognition systems have dramatically improved in performance through the development of general machine learning techniques. However, it is not always straightforward to interpret the mapping from the signal to the detected category. In the present work, we focus on the goal of transparency, specifying the processing steps that lead to robust modeling of vowel place using simple, descriptive Gaussian mixture models. We present a pre-processing and detection framework for vowel place, involving formant measurements, smoothing, and the GMM. This research aims to classify vowel place as a part of a larger speech recognition system. Vowel place was divided into 8 groups based on tongue advancement [Front/Back], height [High/Mid/Low], and root [Atr/Ctr]. Studies were performed using ∼700 vowel-consonant-vowel utterances from the LAFF VCV database.

Full Text