Abstract

Language-independent embedded speech recognition is a necessary and important application. Considering personal privacy, collection difficulty of all the reference words, and limited storage space of mobile devices, language-independent (LI) embedded speech recognition should be classified into lightweight speaker-dependent (SD) cases. Dynamic time warping (DTW) is the state-of-the-art algorithm for small foot-print SD automatic speech recognition. To decrease the high computational complexity of DTW, and to avoid constraints-induced coarse approximation and inaccuracy problems, we introduce a novel confidence index dynamic time warping (CIDTW) approach. CIDTW defines a new cost function, called the confidence index cost function (CICF), to measure the similarity between merged speech training and testing data, while follows the same DTW process. With extensive experiments on three representative SD datasets, CIDTW achieves better accuracy and overall six times faster speeds compared with DTW.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call