Abstract

In this article the definition, construction and statistical evaluation of a prosodic database of Greek speech are presented. The main motivation for the development of such a database was its use as a research tool for Text-to-Speech synthesis and the study of prosody in general. Beginning with the task of text selection we came to a final set containing sentences with almost 95% of all Greek syllables, extracted from a widely used Greek dictionary. Then, a professional radio actress was instructed to utter these sentences in reading style and at reading rate; this was recorded at 44 kHz/16 bits, in the anechoic chamber of a professional studio. The intonational phenomena were transcribed on the corresponding speech signals by a trained phonetician using the ToBI annotation model adapted to Greek prosodic patterns. The speech data were segmented to the phoneme level employing a phoneme recognizer based on the HTK platform. All files were aligned so that possible relations among text, intonational and durational labelling could be identified. For database management, the EMU speech database system was utilized. Extensive measurements of numerous annotated events presented in histograms and tables provide detailed information on the database. Finally, we evaluate prediction models of prosodic phrase breaks and pitch accents derived from our database. Performance of these models was also compared to models derived under the same experimental conditions with a limited domain corpus of Greek speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call