Practical speaker‐independent voice recognition using segmental features

Tatsuya Kimura,Katsuyuki Niyada,Akira Ashida

doi:10.1002/ecjb.10217

Abstract

AbstractThis paper reports a practical method that achieves speaker‐independent large‐vocabulary voice recognition with high accuracy and high noise immunity but with small computational complexity. The first feature of the method is that highly accurate recognition is realized by using an acoustic model in which the input consists of the segmental features formed by the analysis parameters of multiple frames. The second feature is that the likelihood corresponding to the output probability of each state of the acoustic model is calculated by a linear expression with the input parameter vector as the variable. The linear expression is derived from the equal‐covariance assumption. This linear expression reduces the computational complexity and the required memory capacity even if segmental features are used, without degrading recognition performance. The third feature is that a stable word spotting function detects correct solution candidates from the noise interval by including the idea of the a posteriori probability in the likelihood calculation. This word spotting function allows voice recognition which is robust to noise to be realized. The effectiveness of the proposed system was demonstrated in a recognition experiment with superimposed interior noise of a running vehicle. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 87(2): 73–81, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.10217

Full Text