Abstract

This article describes a method for synthesizing Arabic speech by using segments (synthesis units) which are parts of syllables. The method has several advantages. It is inherently capable of dealing with most phonetic variations of the sounds of modern standard Arabic; it permits speech to be synthesized using simple concatenation rules and synthesis techniques such as direct waveform concatenation of the synthesis units; and it allows the realization of prosodic features on segments because the segments are derived from Arabic syllables. Most intersyllabic coarticulations cannot be treated directly by this method. This has prompted a phonetic study of Arabic speech following the present method; we have investigated contextual variations of Arabic sounds and related these variations to the synthesis units. This enabled us to define a small number of allophones of the synthesis units which when included in the text-to-speech transcription process improved the quality of the synthesized speech. The segmentation of the Arabic syllables into synthesis units, on both phonetic and parametric levels, is similar to standard demisyllabic segmentation used by other speech synthesis researchers. However, it differs in one important respect which we have investigated. Synthesized speech is produced by two synthesis techniques: time-domain waveform concatenation and linear predictive (LPC) techniques. Time-domain waveform concatenation is used because it is simple and easy to carry out and is attractive since the juncture points between any contiguous synthesis units are stable. It involves direct concatenation of the digital data that represent the synthesis units. LPC synthesis is used because it eases the treatment of certain distortions which are introduced by the synthesis process. It is a parametric method that allows certain suprasegmental aspects of speech to be modified. This results in memory saving because it allows us to compare speech produced by the two synthesis techniques. The types of distortion introduced by concatenating the synthesis units are studied. It is shown how the stability of the juncture points between contiguous units and the phonological properties of Arabic syllables and speech make it possible to avoid these problems. This can be achieved by using simple synthesis techniques such as direct waveform concatenation and shows how problems can be avoided by using a parametric synthesis technique such as LPC. To demonstrate the suitability of the present method for the synthesis of Arabic speech, the method has been implemented as a non-real-time text-to-speech system on a personal computer (PC) and the intelligibility of the synthesized speech has been established by conducting perception tests.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call