Text To Speech Synthesis for Afaan Oromoo Language Using Deep Learning Approach

doi:10.7176/nmmc/101-02

Abstract

Text to speech synthesis (TTS) which generate input texts is generate to the speech from texts. TTS is very important in aiding impaired people, in teaching and learning process. But, to implemented TTS have a lot of challenging such as text processing, time to phoneme mapping and acoustic modeling for Afaan Oromoo language. So, Afaan Oromoo language mostly required to text to speech synthesis for development of this language. The application of Natural Language Processing is provide that input texts pair speech to generate the desired result outputs of speech in waveforms from prepared text corpus. The normalized text was used for linguistic features are extracted by using Festival toolkit for Afaan Oromoo TTS. The labeled texts are done using Festival toolkit, and generated the utterances of texts from scheme file parameters. The Festival toolkit is used for texts normalized in linguistic extraction from label phoneme alignment to match with speech corpus in trains and tests. The forced alignment is done by HTK toolkit for prepared environment, checked data extracting features within timestamps of state level alignment for acoustic feature extracted. So, this study focus on TTS approach deep learning model based on BLSTM-RNN for Afaan Oromoo language. The RNN model used from a given input feature sequence to extracted duration model and acoustic model. The implementation is done in BLSTM-based on RNN using pytorch library on jupyter notebook, create duration model and generated speech samples from trained acoustic model. We have prepared 1000 texts corpus their matching text transcription from Afaan Oromoo speech corpus by a female speaker dependent for training 700 sentences and tests 300 sentences from dataset domains. In this study, two evaluation techniques used. Frist, the Mean Opinion Score (MOS) evaluation technique is used for intelligibility and naturalness in TTS. The second is Mel Cepstral Distortion (MCD) which is highly used for objective evaluation in model approach for TTS. So, the performance of this model was measured and quality of synthesized speech is assessed in terms of intelligibility and naturalness which results are 3.77 and 3.76 respectively. The total average processed using objective evaluation technique the speech corpus on 16 kHz standards is generated by MCD BLSTM-based on RNN is 3.89 and merlin wave generated is 3.71 correspondingly. Keywords: Text To Speech Synthesis, Mel Cepstral Distortion (MCD), Mean Opinion Square (MOS), Bidirectional Long Short Term Memory Recurrent Neural Network (BLSTM-RNN) DOI: 10.7176/NMMC/101-02 Publication date: April 30 th 2022

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Text To Speech Synthesis for Afaan Oromoo Language Using Deep Learning Approach

Abstract

Talk to us

Similar Papers

More From: New Media and Mass Communication

Lead the way for us

Journal: New Media and Mass Communication	Publication Date: Apr 1, 2022
License type: cc-by

Similar Papers

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features
Jiangyan Yi ... Zhengqi Wen
-
Jiangyan Yi, et. al.Jiangyan Yi ... Zhengqi Wen
01 Dec 2016
01 Dec 2016

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation
Yushi Yao ... Zheng Huang
-
Yushi Yao, et. al.Yushi Yao ... Zheng Huang
01 Jan 2015
01 Jan 2015

A Novel Automated Blood Pressure Estimation Algorithm Using Sequences of Korotkoff Sounds.
Ahmadreza Argha ... Nigel H Lovell
IEEE Journal of Biomedical and Health Informatics | VOL. 25
Ahmadreza Argha, et. al.Ahmadreza Argha ... Nigel H Lovell
28 Jul 2020
IEEE Journal of Biomedical and Health Informatics | VOL. 25

Development of Unit Selection Based Speech Synthesis System
Archana Balyan
SSRN Electronic Journal | VOL. -
Archana BalyanArchana Balyan
01 Jan 2018
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text To Speech Synthesis for Afaan Oromoo Language Using Deep Learning Approach

Abstract

Talk to us

Similar Papers

More From: New Media and Mass Communication