Abstract

Grapheme-to-phoneme conversion (G2P) process—which is is a necessary part of text-to-speech (TTS) systems—aims to predict a sequence of phonemes from a sequence of graphemes. For most languages, this task is limited to concatenated segment pronunciations in the case of words, and concatenated pronunciations of words in the case of a statement. This approach, however, is not viable for some languages, such as the Arabic language, where transitions between sounds in the word and between words in the statement cause changes in their pronunciation according to several considerations depending on the orthographic, phonetic and phonological context. In this work, we propose an approach for Arabic G2P Conversion based on a probabilistic method: joint multi-gram model (JMM). In this approach, we do not need to explain all the G2P correspondence anomalies that we will detail in this paper, but all this knowledge will be included implicitly at the learning stage. We discuss the results and experiments of this method applied on a pronunciation dictionary of the most commonly used Arabic words, and on carefully chosen and annotated texts for continuous speech. The current results do not surpass the baseline system but point the way towards future innovations. Indeed, these results are quite satisfactory on the dictionary adopted for test and learning, with a score of just over 10% error rate on the transcription of phonemes (phoneme error rate), and on the corpus of continuous speech, with a score of just over 11% of PER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.