An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

Ing-Jr Ding,Yen-Ming Hsu

doi:10.1155/2014/898729

Abstract

In the past, the kernel of automatic speech recognition (ASR) is dynamic time warping (DTW), which is feature-based template matching and belongs to the category technique of dynamic programming (DP). Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM-) like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation). A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.

Highlights

Multimedia and home automation services have been popular and necessary techniques in humans’ home life
The proposed hidden Markov model (HMM)-like dynamic time warping (DTW) speech recognition is performed in the application of multimedia and home automation services
The hidden Markov model- (HMM-)like DTW speech recognition system adopts the voice command operation mechanism where a set of DTW keywords referenced template models is established in advance

Summary

Introduction

Multimedia and home automation services have been popular and necessary techniques in humans’ home life. Lots of ASR-related studies focus on HMM and ANN techniques, DTW still has its technical position due to the low complexity recognition calculations and high recognition accuracy, which will be the necessary factor in multimedia and home automation applications [10]. The popular HMM speaker adaptation techniques [14, 15] with proper modifications will be able to be extended to the proposed HMM-like DTW which can effectively solve the problem of learning restriction of developed DTW machine learning in [5]. (ii) a statistical HMM-like classification model with the ability of model adjustments for recognition performance improvements as compared with those enhanced DTW methods that only aim at dynamical programming design of template matching of acoustic features (e.g., [12, 13]),.

Speech Recognition by DTW

The Proposed HMM-Like DTW Approach for Speech Recognition

Experiments and Results

Conclusions