Abstract
The recognition of continuous speech is one of the main challenges in the building of automatic speech recognition (ASR) systems, especially when it comes to phonetically complex languages such as Arabic. An ASR system seems to be actually in a blocked alley. Nearly all solutions follow the same general model. The previous research focused on enhancing its performance by incorporating supplementary features. This paper is part of ongoing research efforts aimed at developing a high-performance Arabic speech recognition system for learning and teaching purposes. It investigates a statistical analysis of certain distinctive features of the basic Arabic phonemes which seems helpful in enhancing the performance of a baseline HMM-based ASR system. The statistics are collected using a particular Arabic speech database, which involves ten different male speakers and more than eight hours of speech which covers all Arabic phonemes. In HMM modeling framework, the statistics provided are helpful in establishing the appropriate number of HMM states for each phoneme and they can also be utilized as an initial condition for the EM estimation procedure, which generally, accelerates the estimation process and, thus, improves the performance of the system. The obtained findings are presented and possible applications of automatic speech recognition and speaker identification systems are also suggested.
Highlights
The most communal way for humans to communicate is through sounds made during speech operation
We present a full statistical analysis of Arabic phonemes which can be employed for the purpose of enhancing performance of our baseline hidden Markov models (HMMs)-based systems by reducing the word error rate (WER) factor
We have presented a collection of statistical data for Basic Arabic phonemes helpful in enhancing HMMbased automatic speech recognition systems performance
Summary
The most communal way for humans to communicate is through sounds made during speech operation. The present paper is part of ongoing research efforts aiming to develop a high-performance Arabic speech recognition system for learning and teaching purposes First stages of these efforts were dedicated to the development of particular Arabic speech database including ten different speakers and more than eight hours of speech collected from recitations of the Holy Quran in which all Arabic phonemes are included. Two baselines HMM-based recognizers were built to validate the speech segmentation on both phoneme and allophone levels and to examine the intended recognition accuracy in both recognizers This current stage investigates a statistical analysis of certain distinctive features in Arabic phonemes in order to incorporate them later into the speech recognition process for the aim of improving the performance of our baseline HMMbased recognizers.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have