Arabic Dialects System using Hidden Markov Models (HMMs)

Zakaria Suliman Zubi,Eman Jibril Idris

doi:10.37394/23205.2022.21.37

Abstract

The Arabic language has many different dialects and it must be recognized before using the automatic speech recognition (ASR). On the other hand, it is observed in all Arab countries that the standard Arabic language is widely written and used in an official speech, newspapers, public administration, and schools but it is not used in daily conversations instead the dialect is widely spoken in daily life and rarely written. In this paper, we examine the difficult task of properly identifying various Arabic dialects and propose a system developed to identify a set of four regional and modern standard Arabic speeches, based on speech recognition using Hidden Markov Models (HMMs) algorithms. HMMs have become a very popular way to build a speech recognition system. It is set as hidden states and possibilities of transition from one state to another. Due to the similarities and differences between the Arabic dialects, speeches collected from the ADI5 datasets were retrieved from the MGB-3 challenge source. We proposed an Arabic Dialect Identification System called "Building a System for Arabic Dialects Identification based on Speech Recognition using Hidden Markov Models (HMMs)" that takes Input as speech utterances and produces output as dialect being spoken. During the training phase, speech utterances from one or more dialects were analyzed to capture the important properties of audio signals in terms of time and frequency. During the testing phase, previously unseen test utterances were utilized to the system, and the system outputs the dialect associated with the model of dialect that most closely matches the test utterance. The proposed model of the system shows promising results of the model for each dialect match.

Full Text