Abstract

Abstract This paper deals with the factors characterizing the production of autonomous vocalic filled pauses in large spontaneous speech corpora, namely language, gender, speaking style and language proficiency. Two types of corpora are analyzed: a corpus of broadcast news in French and American English and a corpus of short talks in a conference in English spoken by native and non-native speakers. Several acoustic and prosodic parameters are evaluated and correlated with each factor, namely timbre, pitch, duration and density. Results presented here show that the timbre is correlated with language and language proficiency, whereas the duration is linked both to gender and speaking style, the latter conditioning also the hesitation density in speech. Index terms: speech disfluencies, autonomous filled pauses, L1/L2, emotional state. 1. Introduction This paper focuses on autonomous vocalic filled pauses in spontaneous speech corpora. Among the phenomena described as “disfluencies”, filled pauses represent one of the most frequently encountered across languages. Autonomous vocalic hesitations as a type of filled pause are widely represented and consist in the insertion “at any moment” in the speech flow of a lengthened vocalic segment, alone or in combination with other segments (such as a nasal coda in English). Its aim is “to announce the initiation of what is expected to be a […] delay in speaking” [1]. Autonomous vocalic hesitations occur without lexical support and are thus to be distinguished from vocal lengthening of segments belonging to lexical items (generally function words). Filled pauses have however other possible realizations, as for instance lengthened nasal consonants (“mm” in Mandarin Chinese) or demonstratives (“ano”, “eto” in Japanese) [2,3]. For the present study we consider vocalic hesitations in French (“euh”) and English (“uh”, “um” in American English; ”er” in British English). Previously autonomous vocalic hesitations have been studied in intra- and inter-language perspectives with no particular consideration of the role of the context on their acoustic and prosodic characteristics. In our former studies, we have compared autonomous vocalic hesitations in 8 languages: American English, Middle Oriental Arabic, Mandarin Chinese, French, Italian, South-American Spanish, and European Portuguese. We have focused on the support vowel of the hesitations in each considered language. The support vowel has been defined as the main vocalic segment of a hesitation, i.e. the longest and most stable realization of each item. This vowel occurs in isolation (as unique realization of the hesitation), in a diphthong or followed by a nasal consonant as in English. Among the parameters characterizing the support vowel, duration, pitch and timbre have received a particular attention. Analysis revealed that the timbre is the most language-dependent parameter characterizing vocalic hesitations. Pitch and duration help both at differentiating the hesitation vowel from vowels with similar timbre within a given language. Pitch and duration seem to show universal patterns, i.e. the main vowel of a hesitation is significantly longer than other similar intra-lexical vowels and exhibits a flat and stable F0 contour [4]. Consequently, the hypothesis has been made that timbre is a language-dependent parameter, whereas pitch and duration could be considered as language-independent features. In this study, we consider 4 factors which may play a role in the production of vocalic hesitation in spontaneous speech corpora: language; gender; spoken style and language proficiency (mother tongue vs. second language).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call