Abstract

This paper describes the implementation of TD-PSOLA tools to improve the quality of the Arabic Text-tospeech (TTS) system. This system based on Diphone concatenation with TD-PSOLA modifier synthesizer. This paper describes techniques to improve the precision of prosodic modifications in the Arabic speech synthesis using the TD-PSOLA (Time Domain Pitch Synchronous Overlap-Add) method. This approach is based on the decomposition of the signal into overlapping frames synchronized with the pitch period. The main objective is to preserve the consistency and accuracy of the pitch marks after prosodic modifications of the speech signal and diphone with vowel integrated database adjustment and optimisation.

Highlights

  • The synthetic voice that imitates human speech from plain text is not a trivial task, since this generally requires great knowledge about the real world, the language, the context where the text comes from, a deep understanding of the semantics of the text content and the relations that underlie all these information

  • Several speech synthesis systems were developed such as vocoders and LPC synthesizers [5][6], but most of them did not reproduce high quality of synthetic speech when compared with that of PSOLA based systems [7] such as MBROLA synthesizers[8]

  • The test group consisted of sixteen persons and the previously mentioned two tests were repeated twice to see whether or not the test results will increase by the learning effect which means that the listeners may become accustomed to the synthesized speech they hear and they understand it better after every listening session

Read more

Summary

INTRODUCTION

The synthetic voice that imitates human speech from plain text is not a trivial task, since this generally requires great knowledge about the real world, the language, the context where the text comes from, a deep understanding of the semantics of the text content and the relations that underlie all these information. Many research and commercial speech synthesis systems developed have contributed to our understanding of all these phenomena, and have been successful in various respective ways for many applications such as in human-machine interaction, hands and eyes free access of information, interactive voice response systems. TD-PSOLA method (Time Domain Pitch Synchronous Overlap-Add) is the most efficient method to produce criteria of satisfaction speech [9] and is one of the most popular concatenation synthesis techniques nowadays. The Short Time signals (ST signals) are overlapped and added with desired spacing of the ST-signals

Introduction for Arabic language
Database construction
Speech analysis
Speech marks
Reading marks
Synthesis marks
Synthesis speech
RESULTS AND EVALUATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call