Abstract

This paper aims to design and validate a phonetically balanced speech corpus for Arabic language. Designing and developing a rich and phonetically balanced corpus in optimal context is one of the key issues in building high quality of text-to-speech synthesis systems. The rich characteristic is in the sense that it must contain all the possible phonemes on the right and left context, whereas the balanced characteristic is in the sense that it respects the phonetic distribution in the language. We propose a new methodology for designing and implementing such corpus for speech synthesis purposes. The paper explains the whole creation process of this corpus, beginning with the design stage, corpus creation, recording phases, and finally the segmentation of the speech corpus. The speech corpus contains 202 sentences with 6174 phonemes. In order to validate the speech corpus, an Arabic speech synthesis system using Hidden Markov Models has been developed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.