Speech recognition using hidden Markov models based on segmental statistics

Kazumasa Yamamoto,Seiichi Nakagawa

doi:10.1002/(sici)1520-684x(19970630)28:7<31::aid-scj4>3.0.co;2-k

Kazumasa Yamamoto, Seiichi Nakagawa

https://doi.org/10.1002/(sici)1520-684x(19970630)28:7<31::aid-scj4>3.0.co;2-k

Copy DOI

Abstract

It is well-known that standard hidden Markov models (HMMs) cannot adequately represent time-variant features while staying in a single state. In order to capture the dynamic characteristics of speech, various methods have been proposed and studied widely; use of linear regression coefficients of time, introduction of conditional density HMMs, and so forth. In this paper, segmental unit input HMMs are described. In this modeling, several successive frames are combined and treated as an input vector. When using segmental statistics, the increased dimension of the parameters degrades the precision in estimating covariance matrices. There, Karhunen-Loeve expansion was used to reduce the dimension. In this paper, the segmental unit input HMMs were compared with the following other methods for modeling the dynamics of speech: conditional density HMMs and use of regression coefficients. These comparisons were done through evaluation experiments in different tasks, namely, continuous syllable recognition, sentence recognition, and isolated word recognition tasks. The results show that utilization of segmental features as input vectors to basic HMMs gave us better recognition rates than those by the traditional methods. Furthermore, integration of regression coefficients into the segmental unit HMMs yielded the best results. © 1997 Scripta Technica, Inc. Syst Comp Jpn, 28(7): 31–38, 1997

Full Text