A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions

Máté Ákos Tündik,Balázs Tarján,György Szaszák

doi:10.1016/j.csl.2020.101076

Abstract

In Automatic Speech Recognition (ASR), inserting the punctuation marks into the word chain hypothesis has long been given low priority, as efforts were concentrated on minimizing word error rates. Punctuation, however, also has a high impact on the transcription quality perceived by the users. Prosody, textual context and their combination have since been used successfully for automatic punctuation of ASR outputs. The recently proposed RNN based solutions show encouraging performance. We believe that current bottlenecks of punctuation technology are on one hand the complex punctuation models, which, having high latency, are not suitable for use-cases with real-time requirements; and on the other hand, punctuation efforts have not been validated against human perception and user impression. The ambition of this paper is to propose a lightweight, yet powerful RNN punctuation model for on-line (real-time including low latency) environment, and also to assess user opinion, in general and also for target users living with hearing loss or impairment. The proposed on-line RNN punctuation model is evaluated against a Maximum Entropy (MaxEnt) baseline, for Hungarian and for English, whereas subjective assessment tests are carried out on real broadcast data subtitled with ASR (closed captioning). As it can be expected, the RNN outperforms the MaxEnt baseline system, but of course not the off-line systems: limiting the future context to minimize latency results only in a slighter performance drop, but ASR errors obviously influence punctuation performance considerably. A genre analysis is also carried out w.r.t. the punctuation performance showing that both recognition and punctuation of more spontaneous speech styles is challenging. Overall, the subjective tests confirmed that users perceive a significant quality improvement when punctuation is added, even in presence of word errors and even if punctuation is automatic and hence itself may contain further errors. For users living with hearing loss or deafness, an even higher, clear preference for the punctuated captions could be confirmed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Feb 12, 2020
Citations: 1

Similar Papers

Reviewing Speech Input with Audio
Jonggi Hong ... Hernisa Kacorri
ACM Transactions on Accessible Computing | VOL. 13
Jonggi Hong, et. al.Jonggi Hong ... Hernisa Kacorri
31 Mar 2020
ACM Transactions on Accessible Computing | VOL. 13

ASR Error Management Using RNN Based Syllable Prediction for Spoken Dialog Applications
Byeongchang Kim ... Junhwi Choi
-
Byeongchang Kim, et. al.Byeongchang Kim ... Junhwi Choi
01 Jan 2015
01 Jan 2015

Towards a generic approach for automatic speech recognition error detection and classification
Rahhal Errattahi ... Hassan Ouahmane
-
Rahhal Errattahi, et. al.Rahhal Errattahi ... Hassan Ouahmane
01 Mar 2018
01 Mar 2018

Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening
Maryam Sadat Mirzaei ... Tatsuya Kawahara
Computer Speech & Language | VOL. 49
Maryam Sadat Mirzaei, et. al.Maryam Sadat Mirzaei ... Tatsuya Kawahara
10 Nov 2017
Computer Speech & Language | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language