Filled pause refinement based on the pronunciation probability for lecture speech.

Yan-Hua Long,Hong Ye,Ian Mcloughlin

doi:10.1371/journal.pone.0123466

Abstract

Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

Highlights

Speech disfluencies are common phenomena in spontaneous and lecture speech [1]
To examine the quality of transcriptions derived from the lightly supervised decoding system with acoustic model (AM) trained on different speech genres, Table 1 presents the results for the bbc.dev dataset using Switchboard-I corpus (SWB)-filled pauses (FPs).AM and Broadcast News (BN)-FP.AM, which were the same AMs used in the SWB-FP and BN-FP systems, respectively
After a deep analysis of those deleted and inserted words, we found that the increased deletions and insertions produced by BN-FP.AM primarily derive from the confusion between FPs and other normal words

Summary

Introduction

Speech disfluencies are common phenomena in spontaneous and lecture speech (e.g., filled pauses, repetitions, and repairs) [1]. The most frequently occurring disfluencies are filled pauses (FPs), especially when the topic is unfamiliar and when speakers are uncertain or need to make decisions. FPs are an integral part of how human speak, can provide valuable information about the speaker’s cognitive state, and can be critical for successful turntaking [2]. For automatic speech transcription systems, FPs have been shown to be problematic because they can be confused with and recognized as small functional words, usually resulting in fragment-like structures that increase transcription error rates [3,4,5,6]. Consideration of how to handle FPs is indispensable to the development of robust speech transcription.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Apr 10, 2015
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Filled pause refinement based on the pronunciation probability for lecture speech.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Acoustic and language models adaptation for Indonesian spontaneous speech recognition
Dessi Puji Lestari ... Angela Irfani
-
Dessi Puji Lestari, et. al.Dessi Puji Lestari ... Angela Irfani
01 Aug 2015
01 Aug 2015

Development of Acoustical Feature Based Classifier Using Decision Fusion Technique for Malay Language Disfluencies Classification
Raseeda Hamzah ... Rosniza Roslan
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 8
Raseeda Hamzah, et. al.Raseeda Hamzah ... Rosniza Roslan
01 Oct 2017
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 8

Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription
Peng Shen ... Hisashi Kawai
Speech Communication | VOL. 82
Peng Shen, et. al.Peng Shen ... Hisashi Kawai
24 May 2016
Speech Communication | VOL. 82

A Mandarin lecture speech transcription system for speech summarization
Ho Yin Chan ... Justin Jian Zhang
-
Ho Yin Chan, et. al. Ho Yin Chan ... Justin Jian Zhang
01 Jan 2007
01 Jan 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Filled pause refinement based on the pronunciation probability for lecture speech.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one