Abstract

The problem associated with spectral sequence comparison for speech comes from the fact that different acoustic renditions, or tokens, of the same speech utterance are seldom realized at the same speed across the entire utterance. In this paper a simple and effective time alignment was introduced for spoken Arabic digit recognition systems. We meant with simplicity here not only in its need for low computational power, but also simplicity to understand, to implement, and to explain to others. While high power computers are available today, time alignment algorithms, such as dynamic time warping algorithm and hidden Markov models need relatively high CPU time, which should be reserved for other complicated tasks. This algorithm has a high accuracy rate considering the very limited number of frames taken from input utterances to be used in training or testing. An artificial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition to test this time alignment algorithm. The system is an isolated whole word speech recognizer and it was implemented in a multi-speaker mode (i.e., the same set of speakers was used in both the training and testing phases). During recognition process, digitized speech was cleaned of noise, then the signal was pre-emphasized and it was windowed and blocked by Hamming window, the time alignment algorithm was used to compensate for the differences in the utterance length and misalignments between phonemes. Frames features were extracted using MFCC coefficients to reduce the amount of the information in the input signal. Finally, the neural network classified the unknown digit. This recognition system achieved 99.48% correct digit recognition while using only seven frames in the time alignment algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call