Abstract

Speech is one of the natural ways of communication between humans, later extended as a means for human–computer interaction. It helps visually impaired people to read electronic texts and is used in information retrieval and language education. This paper proposed the development of a text-to-speech synthesizer for Afan Oromo (Oromo Language), using unit selection speech synthesizer approaches. Although several works have been conducted in the area of text-to-speech synthesis for technologically favored languages for many years, every language has its own unique features. So, speech synthesizer systems developed for one language cannot be used for another language, because the structures of one language are not presumably representative of others. It is clear that each program is based on the system corresponding to the phonetic rules of a certain language. Besides, the existing text-to-speech synthesizer for Afan Oromo was reviewed in this study and the result of developed prototype results are showing promising, however, still, their performance needs a lot of improvement in terms of intelligibility and naturalness using novel approaches and quality of corpus. Therefore, this research was initiated to develop the possibility of developing a prototype text-to-speech synthesizer to improve the performance of the text-to-speech synthesizer. In this study, Afan Oromo corpus was collected from genuine sources and prepared speech datasets both text and audio in collaboration with Afan Oromo experts. The performance of the synthesizer was tested by proper users for its intelligibility and naturalness using Mean Opinion Scale (MOS). The obtained result of naturalness of the prototype is 4.44 (very good) out of 5, which indicated that the result obtained is encouraging and better performance than the existing TTS of Afan Oromo in terms of intelligibility and naturalness. But the result scored in terms of intelligibility still needs further work. The main challenge is Afan Oromo has many dialects, so preparing a balanced text corpus from each dialect is very tough. Moreover, enhancement of the work is predicted to bring a reasonable level of intelligibility to the system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call