A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

Xueyang Wu,Qiang Yang,Weiwei Zhao,Yuanfeng Song,Rongzhong Lian,Di Jiang,Qian Xu

doi:10.26599/air.2022.9150001

Abstract

Robustness is a long-standing challenge for automatic speech recognition (ASR) as the applied environment of any ASR system faces much noisier speech samples than clean training corpora. However, it is impractical to annotate every types of noisy environments. In this work, we propose a novel phonetic-semantic pre-training (PSP) framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training, self-supervised learning, and fine-tuning. In particular, there are three fundamental stages in PSP. First, pre-train the phone-to-word transducer (PWT) to map the generated phone sequence to the target text using only unpaired text data; second, continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic, in additional to self-supervised signals by recovering the tainted phones; and third, fine-tune the resultant PWT with real world speech data. We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets, which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate (CER) reductions of 28.63% and 26.38%, respectively, in two real-life datasets. It also demonstrates its robustness against synthetic highly noisy speech datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

Abstract

Talk to us

Similar Papers

More From: CAAI Artificial Intelligence Research

Lead the way for us

Journal: CAAI Artificial Intelligence Research	Publication Date: Sep 1, 2022
License type: cc-by

Similar Papers

Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
Sining Sun ... Pengcheng Guo
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27
Sining Sun, et. al.Sining Sun ... Pengcheng Guo
01 Nov 2019
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27

Self-attention Aligner: A Latency-control End-to-end Model for ASR Using Self-attention Network and Chunk-hopping
Linhao Dong ... Feng Wang
-
Linhao Dong, et. al.Linhao Dong ... Feng Wang
01 May 2019
01 May 2019

Bone-conducted speech enhancement using deep denoising autoencoder
Hung-Ping Liu ... Chiou-Shann Fuh
Speech Communication | VOL. 104
Hung-Ping Liu, et. al.Hung-Ping Liu ... Chiou-Shann Fuh
02 Jul 2018
Speech Communication | VOL. 104

Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition
Shucong Zhang ... Rama Doddipatla
-
Shucong Zhang, et. al.Shucong Zhang ... Rama Doddipatla
11 Apr 2020
11 Apr 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

Abstract

Talk to us

Similar Papers

More From: CAAI Artificial Intelligence Research