Correction to "Assessing Others' Knowledge Through Their Speech Disfluencies and Gestures".
Correction to "Assessing Others' Knowledge Through Their Speech Disfluencies and Gestures".
- Research Article
26
- 10.1109/tasl.2008.2006728
- Jan 1, 2009
- IEEE Transactions on Audio, Speech, and Language Processing
The presence of disfluencies in spontaneous speech, while poses a challenge for robust automatic recognition, also offers means for gaining additional insights into understanding a speaker's communicative and cognitive state. This paper analyzes disfluencies in children's spontaneous speech, in the context of spoken dialog based computer game play, and addresses the automatic detection of disfluency boundaries. Although several approaches have been proposed to detect disfluencies in speech, relatively little work has been done to utilize visual information to improve the performance and robustness of the disfluency detection system. This paper describes the use of visual information along with prosodic and language information to detect the presence of disfluencies in a child's computer-directed speech and shows how these information sources can be integrated to increase the overall information available for disfluency detection. The experimental results on our children's multimodal dialog corpus indicate that disfluency detection accuracy of over 80% can be obtained by utilizing audio-visual information. Specifically, results showed that the addition of visual information to prosody and language features yield relative improvements in disfluency detection error rates of 3.6% and 6.3%, respectively, for information fusion at the feature level and decision level.
- Research Article
28
- 10.1016/j.specom.2021.05.004
- May 17, 2021
- Speech communication
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt
- Research Article
10
- 10.1016/0093-934x(81)90063-8
- Sep 1, 1981
- Brain and Language
Disfluent speech associated with brain damage
- Research Article
64
- 10.1016/j.jcomdis.2009.06.001
- Jun 21, 2009
- Journal of Communication Disorders
Disfluencies in non-stuttering adults across sample lengths and topics
- Research Article
157
- 10.1016/j.specom.2007.06.002
- Jun 23, 2007
- Speech Communication
Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners
- Conference Article
15
- 10.3115/1073483.1073511
- Jan 1, 2003
We investigate the optimal LM treatment of abundant filled pauses (FP) in spontaneous monologues of a professional dictation task. Questions addressed here are (1) how to deal with FP in the LM history and (2) to which extent can the LM distinguish between positions with high and low FP likelihood. Our results differ partly from observations reported on dialogues. Discarding FP from all LM histories clearly improves the performance. Local perplexities, entropies and word rankings at positions following FP suggest that most FP indicate hesitations rather than restarts. Proper prediction of FP allows to distinguish FP from word positions by a doubled FP probability. Recognition experiments confirm the improvements found in our perplexity studies.
- Research Article
2
- 10.17507/tpls.1307.03
- Jul 1, 2023
- Theory and Practice in Language Studies
Filled pauses (FPs), uh and um, are an inherent characteristic of impromptu spoken English. However, despite the ubiquity of FP studies across languages and their impacts on speech production and comprehension, they have not been thoroughly examined in the context of second language learners of English whose mother tongue is Arabic. Hence, this study analyzed FPs in the speech of female and male non-native speakers of English and those of an American speaker who were guests on a popular English-language podcast. Combining Praat (speech analysis software), and manual coding of FPs and fillers based on previous studies, native and non-native speech was overall peppered with FPs. Although uh was more frequent than um, their frequencies among speakers and characteristic positions varied greatly. Whereas the majority comprised standalone FPs, the remaining FPs co-occurred with fillers (and, but, so, well, and you know) or were aspirated. The average length of the FPs was slightly longer for the native speaker. There were more FPs in the samples taken from early in the podcast episodes than around the middles and sometimes the endings. Regarding gender, male speakers uttered more FPs than the female speaker, whether they are native or non-native speakers.
- Research Article
24
- 10.1002/mdc3.12714
- Dec 30, 2018
- Movement Disorders Clinical Practice
To examine the effect of levodopa medication on speech dysfluency in Parkinson's disease. Fifty-one individuals with Parkinson's disease (IWPD) read aloud during off- and on- medication states. Total speech dysfluencies were calculated from transcriptions of recorded speech samples. Severity of speech dysfluency was not significantly related to the severity of motor symptoms, duration of disease, levodopa equivalent dosage, or age. When the IWPD were divided into two groups based on dysfluency severity, there was a significant group-by-medication state interaction. There was a significant correlation between the medication-related change in speech dysfluency and the off-medication severity of speech dysfluency measure (r = -0.46). The results of this study indicate that levodopa medication can have a significant effect on speech dysfluency. The beneficial levodopa effect appears to be related to the severity of the off-medication speech dysfluency. Results did not provide strong support for the excess dopamine theory of stuttering in IWPD. A dualistic model of the effects of dopamine on speech fluency in PD is proposed.
- Research Article
122
- 10.1017/s1470542707000049
- Jun 1, 2007
- Journal of Germanic Linguistics
This study reports on a number of highly significant differences found between English, German, and Dutch hesitation markers. English and German native speakers used significantly more vocalic-nasal hesitation markers than Dutch native speakers, who used predominantly vocalic hesitation markers. English hesitation markers occurred most frequently when preceded by silence and followed by a lexical item, or when surrounded by silence. German and Dutch hesitation markers occurred most frequently surrounded by lexical items. In Dutch, vocalic-nasal hesitation markers dominated only when surrounded by silence. Vocalic-nasal hesitation markers dominated in all positions in English and German, although in the former language this was more salient than in the latter. Nasal hesitation markers were used significantly more frequently in German than in English or Dutch. In addition to overall language trends, speaker-specific differences, especially within German and Dutch, were observed. These results raise questions in terms of the symptom versus signal hypotheses regarding the function of hesitation markers.I am indebted to Angelika Braun and Jens-Peter Koster for their supervision at the University of Trier. I am also thankful to Monika Schmid and Wim Peeters in the Netherlands and to Eva Gossner in England for their organizational help. Finally, I am very grateful to the participants, and to the two anonymous reviewers for their comments and suggestions. All inadequacies in this article remain my responsibility.
- Conference Article
27
- 10.1109/icslp.1996.607780
- Oct 3, 1996
The study aims to test quantitatively whether filled pauses (FPs) may highlight discourse structure. More specifically it is first investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, the FPs are analyzed acoustically, to check whether those occurring at major discourse boundaries are segmentally and prosodically different from those at shallower breaks. Analyses of twelve spontaneous monologues (Dutch) show that phrases following major discourse boundaries more often contain FPs. Additionally, FPs after stronger breaks tend to occur phrase-initially, whereas the majority of the FPs after weak boundaries are in phrase-internal position. Also, acoustic observations reveal that FPs at major discourse boundaries are both segmentally and prosodically distinct. They also differ with respect to the distribution of neighbouring silent pauses.
- Research Article
- 10.3390/languages11030034
- Feb 25, 2026
- Languages
This study examines the distribution and acoustic characteristics of filled pauses (FPs) in Urdu, a language underrepresented in disfluency research. Drawing on a spontaneous speech dataset from 18 female speakers, the analysis considers the types of FPs, their immediate segmental context, and their utterance position. The analysis also evaluates the effects of segmental context and utterance position on acoustic measures of FPs. Results show a dominant use of vocalic FPs. Moreover, FPs observe systematic contextual patterns and cluster in specific utterance positions. Acoustically, vowel-only and vowel–nasal FPs differ in duration and vowel height (F1). For vowel-only FPs, utterance position significantly conditions duration and prosodic properties (F0, intensity), whereas segmental context does not show any effects. Taken together, the findings demonstrate a language-specific organization of FPs in Urdu. This study offers a detailed phonetic account of Urdu FPs to date and highlights the importance of language-sensitive disfluency modeling in speech technology applications.
- Research Article
300
- 10.1016/s0378-2166(98)00014-9
- Oct 1, 1998
- Journal of Pragmatics
Filled pauses as markers of discourse structure
- Research Article
- 10.1016/j.iswa.2025.200614
- Mar 1, 2026
- Intelligent Systems with Applications
Enhancing token boundary detection in disfluent speech
- Research Article
2
- 10.1111/cogs.70093
- Aug 1, 2025
- Cognitive science
How language interacts with metacognitive processes is an understudied area. Earlier research shows that people produce disfluencies (i.e., "uh" s or "um" s) in their speech when they are not sure of their answers, indicating metacognitive monitoring. Gestures have monitoring and predictive roles in language, also implicating metacognitive processes. Further, the rate of speech disfluencies and gestures change as a function of the communicational setting. People produce fewer disfluencies and more gestures when they can see the listener than when the listener is not visible. In the current study, 50 participants (32 women, Mage=21.16, SD=1.46) were asked 40 general knowledge questions, either with a visible (n=25) or nonvisible (n=25) listener. They provided feelings-of-knowing (FOK) judgment immediately after seeing the question and were asked to think aloud while pondering their answers. Then, they provided retrospective confidence judgments (RCJs). Results showed that gestures and speech disfluencies were not related either to the accuracy or the FOK judgments. However, both gestures and speech disfluencies predicted RCJs uniquely and interactively. Speech disfluencies negatively predicted RCJs. In contrast, hand gestures were positively related to RCJs. Importantly, the use of gestures was more strongly related to RCJs when disfluencies were also higher. No effect of communicational setting on the rate of gestures or speech disfluencies was found. These results highlight the importance of multimodal language cues in the elaboration of metacognitive judgments.
- Research Article
- 10.1111/1460-6984.70184
- Dec 28, 2025
- International journal of language & communication disorders
This study aimed to analyse the frequency and types of disfluencies in spontaneous speech and reading among adults with autism spectrum disorder (ASD) compared to neurotypical adults. The participants were 56 Dutch-speaking adults, 28 with ASD and 28 age- and gender-matched controls. Samples of spontaneous speech and text reading were orthographically transcribed, and the speech disfluencies were identified and classified, using an expanded version of the Illinois Disfluency Classification System. The frequencies of stuttering-like disfluencies (SLDs), other disfluencies (ODs), word-final disfluencies (WFDs), and total disfluencies (TDs) were calculated. Adults with ASD exhibited significantly more SLDs and WFDs in spontaneous speech than the control group. While no statistically significant differences were observed between both groups in reading, a trend towards increased WFDs was noted. Adults with ASD exhibit increased speech disfluencies, more specific SLDs and WFDs, in spontaneous speech, than neurotypical adults, but not during reading. This discrepancy may arise because spontaneous speech requires real-time language formulation and social communication skills, which can differ in ASD, whereas reading offers an external linguistic structure that reduces cognitive and social processing demands. Increased speech disfluencies may impact how speech is perceived in terms of intelligibility and/or social communication dynamics. What is already known on this subject Individuals with autism spectrum disorder (ASD) often exhibit challenges in pragmatic language use along with varying abilities in vocabulary, grammar and speech production. Concerning the latter, a limited body of research has identified specific characteristics of speech fluency in individuals with ASD, including a higher frequency of speech disfluencies and the occurrence of word-final disfluencies. What this study adds to the existing knowledge However, research on speech disfluency in people with ASD remains all in all limited. To date, only two studies have conducted an in-depth analysis of disfluency types using an elaborate classification system such as the Illinois Disfluency Classification System, both incorporating neurotypical control groups. One study focused on English-speaking school-aged children and the other on Finnish-speaking young adults. As these studies are confined to two linguistically distinct populations and based on similar speech sample types, the generalizability of their findings to other languages and speech samples remains uncertain. In the current study, we analysed the speech disfluency of Dutch-speaking adults with ASD, with a broader age range. In addition to the analysis of spontaneous speech, also a standard reading text was included to evaluate the impact of sample type. This study therefore extends the existing database and provides further insights into the types and frequency of speech disfluencies in adults with ASD. What are the potential or actual clinical implications for this work? Increased speech disfluencies can affect the speech intelligibility and/or social interaction of adults with ASD. Moreover, the integration of a more detailed analysis of disfluencies in individuals with ASD as part of a broader overall assessment might optimize the diagnostical and clinical decision-making process.