Abstract

Speech recognition is an area that is constantly developing. In this study, the authors present a new system of speech recognition applied to the Arabic language. The system proposed here is based on the harmonic plus noise model (HNM). This model is rather used in speech synthesis tasks and is known for providing excellent speech production quality. Thus, their contribution lies in replacing the conventional mel-frequency cepstrum coefficients (MFCC) parameters with a set of acoustic parameters, extracted through the HNM estimation process. The HNM model allows development of a more adapted processing by distinguishing voiced and unvoiced speech frames and by characterising the harmonic property of speech. As common, their system consists of both training and recognition phases. Deep neural networks and hidden Markov models (DNN–HMM) are used for modelling the voiced frames corresponding to the harmonic part. The DNN model is estimated with static and dynamic parameters. Moreover, the unvoiced frames, which represent the noise part of the HNM, are clustered with an HMM model. The spoken Arabic digits are used to measure the performance of the proposed recognition system and a comparison with the MFCC-based approach is performed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call