Abstract

The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA). We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The West Point modern standard Arabic database from the language data consortium (LDC) and the hidden Markov Model Toolkit (HTK) are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.

Highlights

  • Pronunciation variability is by far the most critical issue for Arabic automatic speech recognition (AASR)

  • We have presented the results obtained by an HMM-based speaker independent, large-vocabulary speech recognition system for modern standard Arabic with a focus on the problem of foreign accents

  • The obtained results show that at the phonetic level, the female nonnative speakers perform better than nonnative male speakers

Read more

Summary

Introduction

Pronunciation variability is by far the most critical issue for Arabic automatic speech recognition (AASR) This is mainly due to the large number of nonnative accents and to the fact that nonnative speech data available for training are generally insufficient. His work concentrated on analyzing and modeling nonnative speech for automatic speech recognition He examined—among other tasks—the problem of nonnative speech in a speaker independent, large-vocabulary, spontaneous speech recognition system for American English with native training data. He showed that the interpolated native and nonnative models reduce the word error rate on a nonnative test set by 8.1% relative to his baseline recognizer using models trained on pooled native and nonnative data.

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call