Abstract
Over the last few decades Hidden Markov Models (HMM) have become a dominating technology in automatic speech recognition (ASR) systems. Contemporary HMM-based solutions use Gaussian mixture models (GMM) for modeling acoustic speech variability. ASR algorithms involving acoustic models constructed with the use of deep neural networks (DNN) outperform GMMs in recognizing large-vocabulary speech. However, these algorithms feature extremely high computation complexity, due to which they cannot be applied in voice control systems with moderate computational resources. An approach to developing an algorithm for recognizing isolated words with low computation complexity is considered. All components of the isolated word recognition engine are described. A sequence of quantized Mel-frequency cepstral coefficients (MFCC) is used as speech signal description features. A fast isolated words recognition algorithm constructed on the basis of a stationary distribution of the Hidden Markov model is described. The proposed algorithm is characterized by a linear complexity with respect to the observed sequence length and requires significantly less memory compared with algorithms on the basis of GMM or DNN models.The algorithm’s recognition performance is evaluated on TIMIT isolated words dataset and the base of Russian words that was set up by the authors. It has been demonstrated that the proposed algorithm shows recognition performance that is only slightly inferior to GMMs and superior to self-adjustment neural networks.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have