Confidence index dynamic time warping for language-independent embedded speech recognition

Xianglilan Zhang,Ming Li,Jiping Sun,Zhigang Luo

doi:10.1109/icassp.2013.6639236

Abstract

Language-independent embedded speech recognition is a necessary and important application. Considering personal privacy, collection difficulty of all the reference words, and limited storage space of mobile devices, language-independent (LI) embedded speech recognition should be classified into lightweight speaker-dependent (SD) cases. Dynamic time warping (DTW) is the state-of-the-art algorithm for small foot-print SD automatic speech recognition. To decrease the high computational complexity of DTW, and to avoid constraints-induced coarse approximation and inaccuracy problems, we introduce a novel confidence index dynamic time warping (CIDTW) approach. CIDTW defines a new cost function, called the confidence index cost function (CICF), to measure the similarity between merged speech training and testing data, while follows the same DTW process. With extensive experiments on three representative SD datasets, CIDTW achieves better accuracy and overall six times faster speeds compared with DTW.

Full Text