Abstract

Problem statement: With the rapid advancement in information technolo gy and communications, computer systems increasingly offer the users the opportunity to interact with information through speech. The interest in speech synthesis and in building voices is increasing. Worldwide, speech synthesizers have been developed for many popular languages English, Spanish and French and many researches and developments have been applied to those languages. Arabic on the other hand, has been given little attention com pared to other languages of similar importance and the research in Arabic is still in its infancy. Bas ed on these ideas, we introduced a system to transf orm Arabic text that was retrieved from a search engine into spoken words. Approach: We designed a text- to-speech system in which we used concatenative speech synthesis approach to synthesize Arabic text. The synthesizer was based on artificial neural netw orks, specifically the unsupervised learning paradigm. Different sizes of speech units had been used to produce spoken utterances, which are words, diphones and triphones. We also built a dict ionary of 500 common words of Arabic. The smaller speech units (diphones and triphones) used for synthesis were chosen to achieve unlimited vocabulary of speech, while the word units were use d for synthesizing limited set of sentences. Results: The system showed very high accuracy in synthesizing the Arabic text and the output speech was highly intelligible. For the word and diphone u nit experiments, we could reach an accuracy of 99% while for the triphone units we reached an accu racy of 86.5%. Conclusion: An Arabic text-to- speech synthesizer was built with the ability to pr oduce unlimited number of words with high quality voice.

Highlights

  • A Text-To-Speech synthesizer (TTS) is a computer-based program in which the system processes through the text and reads it aloud

  • The speech synthesizer consists of two main components, namely: the text processing component and the Digital Signal Processing (DSP) module

  • The text processing component has two major tasks. It converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words, this process is often called text normalization. It converts the text into some other representation and output it to the DSP module or synthesizer, which transforms the symbolic information it receives into speech

Read more

Summary

Introduction

A Text-To-Speech synthesizer (TTS) is a computer-based program in which the system processes through the text and reads it aloud. There is a demand on the technology to deliver good and acceptable quality of speech. The speech synthesizer consists of two main components, namely: the text processing component and the Digital Signal Processing (DSP) module. The text processing component has two major tasks. It converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words, this process is often called text normalization. It converts the text into some other representation and output it to the DSP module or synthesizer, which transforms the symbolic information it receives into speech

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.