Abstract
In Automatic Speech Recognition (ASR), inserting the punctuation marks into the word chain hypothesis has long been given low priority, as efforts were concentrated on minimizing word error rates. Punctuation, however, also has a high impact on the transcription quality perceived by the users. Prosody, textual context and their combination have since been used successfully for automatic punctuation of ASR outputs. The recently proposed RNN based solutions show encouraging performance. We believe that current bottlenecks of punctuation technology are on one hand the complex punctuation models, which, having high latency, are not suitable for use-cases with real-time requirements; and on the other hand, punctuation efforts have not been validated against human perception and user impression. The ambition of this paper is to propose a lightweight, yet powerful RNN punctuation model for on-line (real-time including low latency) environment, and also to assess user opinion, in general and also for target users living with hearing loss or impairment. The proposed on-line RNN punctuation model is evaluated against a Maximum Entropy (MaxEnt) baseline, for Hungarian and for English, whereas subjective assessment tests are carried out on real broadcast data subtitled with ASR (closed captioning). As it can be expected, the RNN outperforms the MaxEnt baseline system, but of course not the off-line systems: limiting the future context to minimize latency results only in a slighter performance drop, but ASR errors obviously influence punctuation performance considerably. A genre analysis is also carried out w.r.t. the punctuation performance showing that both recognition and punctuation of more spontaneous speech styles is challenging. Overall, the subjective tests confirmed that users perceive a significant quality improvement when punctuation is added, even in presence of word errors and even if punctuation is automatic and hence itself may contain further errors. For users living with hearing loss or deafness, an even higher, clear preference for the punctuated captions could be confirmed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.