Spoken arabic digits recognizer using recurrent neural networks

Y.A Alotaibi

doi:10.1109/isspit.2004.1433720

Abstract

Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the ten digits, zero through nine. All Arabic digits are polysyllabic (except digit zero which is a monosyllabic) words and most of them contain Arabic unique phonemes, namely, pharyngeal end emphatic subset. In this paper Arabic digits were investigated from the speech recognition problem point of view. A recurrent neural networks based speech recognition system was designed and tested with automatic Arabic digits recognition. The system is an isolated whole word speech recognizer and it was implemented both as a multispeaker (i.e., the same set of speakers were used in both the training and testing phases) mode and speaker-independent (i.e., speakers used for training are different from those used for testing) mode. During recognition process, the digitized speech is cleaned from the noise by means of band-pass filters, the signal is also preemphasized, then it windowed and blocked by Hamming window, a time alignment algorithm is used to compensate for the differences in the utterances' lengths and misalignments between phonemes, frames features are extracted by using MFCC coefficients to reduce the amount of the information in the input signal and finally the neural network classifies the unknown digit. This recognition system achieved 99.5% correct digit recognition in the case of multispeaker mode, and 94.5% in the case of speaker-independent mode.

Full Text