FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition.

Amrit Romana,Minxue Niu,Matthew Perez,Emily Mower Provost

doi:10.1044/2024_jslhr-24-00070

Abstract

This work introduces updated transcripts, disfluency annotations, and word timings for FluencyBank, which we refer to as FluencyBank Timestamped. This data set will enable the thorough analysis of how speech processing models (such as speech recognition and disfluency detection models) perform when evaluated with typical speech versus speech from people who stutter (PWS). We update the FluencyBank data set, which includes audio recordings from adults who stutter, to explore the robustness of speech processing models. Our update (semi-automated with manual review) includes new transcripts with timestamps and disfluency labels corresponding to each token in the transcript. Our disfluency labels capture typical disfluencies (filled pauses, repetitions, revisions, and partial words), and we explore how speech model performance compares for Switchboard (typical speech) and FluencyBank Timestamped. We present benchmarks for three speech tasks: intended speech recognition, text-based disfluency detection, and audio-based disfluency detection. For the first task, we evaluate how well Whisper performs for intended speech recognition (i.e., transcribing speech without disfluencies). For the next tasks, we evaluate how well a Bidirectional Embedding Representations from Transformers (BERT) text-based model and a Whisper audio-based model perform for disfluency detection. We select these models, BERT and Whisper, as they have shown high accuracies on a broad range of tasks in their language and audio domains, respectively. For the transcription task, we calculate an intended speech word error rate (isWER) between the model's output and the speaker's intended speech (i.e., speech without disfluencies). We find isWER is comparable between Switchboard and FluencyBank Timestamped, but that Whisper transcribes filled pauses and partial words at higher rates in the latter data set. Within FluencyBank Timestamped, isWER increases with stuttering severity. For the disfluency detection tasks, we find the models detect filled pauses, revisions, and partial words relatively well in FluencyBank Timestamped, but performance drops substantially for repetitions because the models are unable to generalize to the different types of repetitions (e.g., multiple repetitions and sound repetitions) from PWS. We hope that FluencyBank Timestamped will allow researchers to explore closing performance gaps between typical speech and speech from PWS. Our analysis shows that there are gaps in speech recognition and disfluency detection performance between typical speech and speech from PWS. We hope that FluencyBank Timestamped will contribute to more advancements in training robust speech processing models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition.

Abstract

Talk to us

Similar Papers

More From: Journal of speech, language, and hearing research : JSLHR

Lead the way for us

Journal: Journal of speech, language, and hearing research : JSLHR	Publication Date: Oct 8, 2024
License type: cc-by-nc-sa

Similar Papers

Disfluent Speech and the Psychological Aspect among Bilingual PWS in Japanese
...
-
, et. al. ...
29 Mar 2020
29 Mar 2020

Relationship between Speech Production and Perception in People Who Stutter.
Chunming Lu ... Li Liu
Frontiers in Human Neuroscience | VOL. 10
Chunming Lu, et. al.Chunming Lu ... Li Liu
18 May 2016
Frontiers in Human Neuroscience | VOL. 10

Speech sequence skill learning in adults who stutter
Kim R Bauerly ... Luc F De Nil
Journal of Fluency Disorders | VOL. 36
Kim R Bauerly, et. al.Kim R Bauerly ... Luc F De Nil
06 Jun 2011
Journal of Fluency Disorders | VOL. 36

Subtle Patterns of Altered Responsiveness to Delayed Auditory Feedback during Finger Tapping in People Who Stutter.
Giorgio Lazzari ... Floris T Van Vugt
Brain sciences | VOL. 14
Giorgio Lazzari, et. al.Giorgio Lazzari ... Floris T Van Vugt
07 May 2024
Brain sciences | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition.

Abstract

Talk to us

Similar Papers

More From: Journal of speech, language, and hearing research : JSLHR