Improving Readability for Automatic Speech Recognition Transcription

Junwei Liao,Linjun Shou,Liyang Lu,Michael Zeng,Hong Qu,Sefik Eskimez,Ming Gong,Yu Shi

doi:10.1145/3557894

Abstract

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other noises common in spoken communication. These readable issues introduced by speakers and ASR systems will impair the performance of downstream tasks and the understanding of human readers. In this work, we present a task called ASR post-processing for readability (APR) and formulate it as a sequence-to-sequence text generation problem. The APR task aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of speakers. We further study the APR task from the benchmark dataset, evaluation metrics, and baseline models: First, to address the lack of task-specific data, we propose a method to construct a dataset for the APR task by using the data collected for grammatical error correction. Second, we utilize metrics adapted or borrowed from similar tasks to evaluate model performance on the APR task. Lastly, we use several typical or adapted pre-trained models as the baseline models for the APR task. Furthermore, we fine-tune the baseline models on the constructed dataset and compare their performance with a traditional pipeline method in terms of proposed evaluation metrics. Experimental results show that all the fine-tuned baseline models perform better than the traditional pipeline method, and our adapted RoBERTa model outperforms the pipeline method by 4.95 and 6.63 BLEU points on two test sets, respectively. The human evaluation and case study further reveal the ability of the proposed model to improve the readability of ASR transcripts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Readability for Automatic Speech Recognition Transcription

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: May 9, 2023
Citations: 11

Similar Papers

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Non-Native Pronunciation Variation Modeling for Automatic Speech Recognition
Hong Kook ... Mina Kim
-
Hong Kook, et. al.Hong Kook ... Mina Kim
16 Aug 2010
16 Aug 2010

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Neural Speech-to-Text Language Models for Rescoring Hypotheses of DNN-HMM Hybrid Automatic Speech Recognition Systems
Tomohiro Tanaka ... Ryo Masumura
-
Tomohiro Tanaka, et. al.Tomohiro Tanaka ... Ryo Masumura
01 Nov 2018
01 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Readability for Automatic Speech Recognition Transcription

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing