Abstract
This paper first proposes a labeling scheme for tonal aspects of speech and then describes an automatic annotation system using this transcription. This fine-grained transcription provides labels indicating pitch level and pitch movement of individual syllables. Of the five pitch levels, three (low, mid, high) are defined on the basis of pitch changes in the local context and two (bottom, top) are defined relative to the boundaries of the speaker’s global pitch range. For pitch movements, both simple and compound, the transcription indicates direction (rise, fall, level) and size, using size categories (pitch intervals) adjusted relative to the speaker’s pitch range. The automatic tonal annotation system combines several processing steps: segmentation into syllable peaks, pause detection, pitch stylization, pitch range estimation, classification of the intra-syllabic pitch contour, and pitch level assignment. It uses a dedicated and rule-based procedure, which unlike commonly used supervised learning techniques does not require a labeled corpus for training the model. The paper also includes a preliminary evaluation of the annotation system, for a reference corpus of nearly 14 minutes of spontaneous speech in French and Dutch, in order to quantify the annotation errors. The results, expressed in terms of standard measures of precision, recall, accuracy and Fmeasure are encouraging. For pitch levels low, mid and high an F-measure between 0.946 and 0.815 is obtained and for pitch movements a value between 0.708 and 1. Provided additional modules for the detection of prominence and prosodic boundaries, the resulting annotation may serve as an input for a phonological annotation.
Highlights
It is widely acknowledged that the understanding of prosody is essential for the development of speech technology applications, such as text-to-speech synthesis and human-machine dialog systems, and that progress in this area requires large speech corpora, containing a reliable and objective representation of prosody
The first concerns the way in which prosody is represented, i.e. the annotation convention or labeling scheme, and how this representation may be used by human labelers
He listened to short stretches of the speech signal as many times as needed to check the audibility, direction and size of the pitch movements provided by the automatic transcription, and to check whether the transcribed pitch levels corresponded to the perceived ones
Summary
It is widely acknowledged that the understanding of prosody is essential for the development of speech technology applications, such as text-to-speech synthesis and human-machine dialog systems, and that progress in this area requires large speech corpora, containing a reliable and objective representation of prosody. Intonation research in phonetics and linguistics, too, would benefit largely from speech corpus annotations indicating pitch contours, stress and prosodic boundaries. Prosodically annotated corpora are scarce, for languages other than English. The design and evaluation of such a system constitutes the topic of this paper. It describes a system for the automatic transcription of tonal attributes in speech corpora, providing a transcription identifying pitch levels and pitch movements associated with syllables or sequences of syllables. This system will be referred to as the Polytonia system
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.