Abstract

In this paper, the authors describe and propose an efficient and effective method of speaker-independent continuous Arabic speech recognition method, based on a phonetically rich speech corpus. This speech corpus contains two datasets: (1) the Spoken Arabic Digits (SAD), recorded by 66 speakers (33 male and 33 female), holds 6600 digit words in which each speaker pronounces each digit from zero to nine ten times, and (2) the second dataset is restrained to quranic sentences of three famous speakers with Tajweed rules (somehow of singing) of the last 30 chapters (Surat) of the Holy Quran. The problematic of co-articulation was treated in general with triphones as acoustic model and bigram as language model which was the most appropriate in this text; and the adjustment’s problem of sound duration in particular was also treated, with triphones expanded to Gaussians mixtures models (GMM). The proposed Arabic speech recognition system is based on the Cambridge Hidden Markov Model (HMM) Toolkit (HTK) tools. The Experimental tests show that the digit dataset using 3 emitting states per phone, has an excellent word recognition (WR) rate of 97.95% and sentence recognition (SR) rate of 93.14%. The best result of Quranic phrases obtained is 73.44% of WR rate and 14.38% of SR rate with 66.09% of accuracy within “Elmirigli” reader. Adapting the basic system to the speaker’s intonation voice, then applying an appropriate GMM to each tied-list triphone, it outperforms the Hidden Markov Models HMM-triphone acoustic model experiment by 16.93% of WR rate within “Alsudaissi” reader thanks to the solved duration vowels (mudud) problem. The contribution of this work is to apply triphones expanded to Gmms method which was compared to Maximum Likelihood Linear Regression (MLLR) and beat it by almost 5% in “Alsudaissi” reader.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call