Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and a Two-Stage Pre-Trained Approach

Junwei Liao,Yong Xu,Yu Shi

doi:10.1109/access.2022.3219838

Abstract

Nowadays Automatic Speech Recognition (ASR) systems can accurately recognize which words are said. However, due to the disfluency, grammatical error, and other phenomena in spontaneous speech, the verbatim transcription of ASR impairs its readability, which is crucial for human comprehension and downstream tasks processing that need to understand the meaning and purpose of what is spoken. In this work, we formulate the ASR post-processing for readability (APR) as a sequence-to-sequence text generation problem that aims to transform the incorrect and noisy ASR output into readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. To solve the problem of too little training data, we propose a novel data augmentation method that synthesizes large-scale training data from the grammatical error correction dataset. We propose a model based on the pre-trained language model to perform the APR task and train the model with a two-stage training strategy to better exploit the augmented data. On the constructed test set, our approach outperforms the best baseline system by a large margin of 17.53 on BLEU and 13.26 on readability-aware WER (RA-WER). The human evaluation also shows that our model can generate more human-readable transcripts than the baseline method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and a Two-Stage Pre-Trained Approach

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Journal: IEEE Access	Publication Date: Jan 1, 2022
License type: CC BY 4.0

Similar Papers

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model
Junwei Liao ... Sefik Eskimez
-
Junwei Liao, et. al.Junwei Liao ... Sefik Eskimez
06 Jun 2021
06 Jun 2021

Improving Readability for Automatic Speech Recognition Transcription
Junwei Liao ... Yu Shi
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Junwei Liao, et. al.Junwei Liao ... Yu Shi
09 May 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
G Thimmaraja Yadava ... H S Jayanna
International Journal of Speech Technology | VOL. 23
G Thimmaraja Yadava, et. al.G Thimmaraja Yadava ... H S Jayanna
22 Jan 2020
International Journal of Speech Technology | VOL. 23

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and a Two-Stage Pre-Trained Approach

Abstract

Talk to us

Similar Papers

More From: IEEE Access