A novel Wake-Up-Word speech recognition system, Wake-Up-Word recognition task, technology and evaluation

V.Z Këpuska,T.B Klein

doi:10.1016/j.na.2009.06.089

Abstract

Wake-Up-Word (WUW) is a new paradigm in speech recognition (SR) that is not yet widely recognized. This paper defines and investigates WUW speech recognition, describes details of this novel solution and the technology that implements it. WUW SR is defined as detection of a single word or phrase when spoken in the alerting context of requesting attention, while rejecting all other words, phrases, sounds, noises and other acoustic events and the same word or phrase spoken in non-alerting context with virtually 100% accuracy. In order to achieve this accuracy, the following innovations were accomplished: (1) Hidden Markov Model triple scoring with Support Vector Machine classification, (2) Combining multiple speech feature streams: Mel-scale Filtered Cepstral Coefficients (MFCCs), Linear Prediction Coefficients (LPC)-smoothed MFCCs, and Enhanced MFCC, and (3) Improved Voice Activity Detector with Support Vector Machines. WUW detection and recognition performance is 2514%, or 26 times better than HTK for the same training & testing data, and 2271%, or 24 times better than Microsoft SAPI 5.1 recognizer. The out-of-vocabulary rejection performance is over 65,233%, or 653 times better than HTK, and 5900% to 42,900%, or 60 to 430 times better than the Microsoft SAPI 5.1 recognizer. This solution that utilizes a new recognition paradigm applies not only to WUW task but also to any general Speech Recognition tasks.

Full Text