Abstract

It was examined how much speech material is required to build a prosodic model for duration, fundamental frequency and intensity. For each of two speakers, fifty multiple linear regression models were built on the basis of seventy utterances per speaker (7'522 and 7'643 segments respectively). Models based on eight and twenty utterances showed good stability, satisfactory prediction for novel material, as well as closeness of fits comparable to those reported by other researchers for much larger corpora. Linear regressions were typically based on about ten independent predictors per prosodic parameter, which had previously been ranked according to their prediction of the dependent parameter. This ranking procedure advantageously replaced more commonly used regression trees. Variation in the closeness of fit of models based on sliding windows eight and twenty utterances long were traced to variations in bias, i.e., in the degree to which models systematically under- or overestimate target values. While the models in this study involved simple, non-optimized linear regressions without interactions, avenues are suggested for further improving the performance of this class of models. The results of this study suggest that a series of well-adapted small-footprint models provide more accurate information about the individual use of prosody in specific speech situations than a single model based on abundant data.

Highlights

  • Prosodic structures are marked by considerable complexity and are produced with substantial variability

  • By applying a set of multiple linear regressions to the relationship between linguistic and prosodic parameters, the essence of a prosodic style was captured in some twenty utterances

  • The two corpus segments used here were (1) one and a half news bulletins spoken by Brian Perkins (BP) a BBC newscaster, and (2) a portion of "Innocence and Design", the 1985 Reith Lecture presented by economist David Henderson (DH)

Read more

Summary

Introduction

Prosodic structures are marked by considerable complexity and are produced with substantial variability. Specifying a given linguistic or prosodic rule (e.g., final syllable lengthening) for universal language use or for a certain set of languages is considered to be a much more desirable statement than specifying it just for one speaker, in a small set of utterances, or for one dialect of a given language This serves to identify the "solid" components of language structures. A process model or a prosodic model oriented towards a style- and speaker-specific characterisation of speech may have to be quite a bit more flexible and more detailed than a model for general language use Such a model must specify which phonological rules should apply and with how much reliability, and the gradient values that govern the rules' application. Between linguistic and prosodic parameters ("rules") are identified in various ways to create numeric and/or rule-governed models Such models are used to analyse and/or synthesize new stretches of speech and to fine-tune the model. In the final part of the article, some suggestions will be made concerning how this class of models can be further improved by various techniques, and an outlook is given on further work that can be performed within this framework

Method
CUVOALD
Multiple Regression as Applied to Prosodic Modelling
Evaluation and Precision
Results
Modelling the Original Data
Modelling Novel Data
Outlook for Further Research
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call