Speech Recognition Based on the Grid Method and Image Similarity

Janusz Dulas

doi:10.5772/17897

Abstract

The problem of communication between a man and a machine is very old. First constructors had to decide how to transmit information from a man to machine and vice versa. This problem still exists and each engineer who designs a new device must decide how the communication between operator and a machine will be done. Simple devices use buttons and Light Emit Diodes [LED], more complicated – keyboards and screens. Fast technical development and numerous scientific research allowed to use also voice for this purpose. Here there are two different problems: voice producing and voice recognition. The first one is not very difficult, in the simplest case the machine could record a set of words which would be used for communication. Nowadays there are specialised integrated circuits (speech processors) which enable recording and reproducing whole words and sentences. The second problem – voice recognition is more complicated. First of all people are different and say the same words in a different way. Secondly, the way of speaking depends on many aspects like health of the speaker, his mood or emotion. Thirdly, we are living in noisy environment so usually together with speech signal we also obtain different noises. There are a lot of different methods used for automatic speech recognition. The most popular is HMM – Hidden Markov Model (Junho & Hanseok, 2006; Wydra, 2007; Kumar & Sreenivas 2005; Ketabdar at al., 2005) , which uses sequences of events (states), where the probability of being in each state and the probability of transition to the other states are counted. Each state is described by many, mostly spectral parameters. There are also other, less popular methods used for automatic speech recognition like Neural Network Method (Vali at al., 2006; Holmberg at al.,2005; Togneri & Deng, 2007), Audio-visual Method (Seymour at al., 2007; Hueber at al., 2007) and many others (Nishida et al., 2005). Nowadays there is a possibility to achieve more than 90% accuracy in automatic speech recognition. HMM method, although the most popular, is very complicated. Each state is described by the matrix with many spectral, cepstral and linear prediction parameters. It causes the need for many analyses and calculations during the automatic recognition process. In this chapter the author shows a new approach to this problem. The new method described here is simpler and faster than HMM method and gives similar or better results in speech recognition accuracy. Although it was tested in Polish, the rules can be adopted to other languages.

Full Text