Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient

Hoon Chung,Hyeong Bae Jeon,Jeon Gue Park,Sung Joo Lee

doi:10.3390/app10103542

Hoon Chung, Hyeong Bae Jeon + Show 2 more

Open Access

PDF Available

https://doi.org/10.3390/app10103542

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In this paper, we propose a policy gradient-based semi-supervised speech recognition acoustic model training. In practice, self-training and teacher/student learning are one of the widely used semi-supervised training methods due to their scalability and effectiveness. These methods are based on generating pseudo labels for unlabeled samples using a pre-trained model and selecting reliable samples using confidence measure. However, there are some considerations in this approach. The generated pseudo labels can be biased depending on which pre-trained model is used, and the training process can be complicated because the confidence measure is usually carried out in post-processing using external knowledge. Therefore, to address these issues, we propose a policy gradient method-based approach. Policy gradient is a reinforcement learning algorithm to find an optimal behavior strategy for an agent to obtain optimal rewards. The policy gradient-based approach provides a framework for exploring unlabeled data as well as exploiting labeled data, and it also provides a way to incorporate external knowledge in the same training cycle. The proposed approach was evaluated on an in-house non-native Korean recognition domain. The experimental results show that the method is effective in semi-supervised acoustic model training.

Highlights

Deep neural network (DNN)-based acoustic modeling has resulted in significant improvements in automatic speech recognition
Feed-forward deep neural network (FFDNN)-based acoustic models have achieved more improvement compared to Gaussian mixture model (GMM)-based acoustic models for phone-call transcription benchmark domains [1], and deep convolutional neural network (CNN)-based acoustic models outperformed feed-forward deep neural network (FFDNN) on a news broadcast and switchboard task domains [2]
Self-training or teacher/student learning-based semi-supervised acoustic model training methods are among the most popular approaches, these methods are not effective if a pre-trained model is not matched to unlabeled data or there is no pre-trained model

Summary

Introduction

Deep neural network (DNN)-based acoustic modeling has resulted in significant improvements in automatic speech recognition. Self-training-based methods focus on generating machine transcriptions for unlabeled data using a pre-trained automatic speech recognition system and confidence measures, Appl. Teacher/student learning-based approaches use the output distribution of the pre-trained model as a target for the student model to alleviate the complexity of confidence measures for large scale training [12]. Of these methods, self-training and teacher/student learning-based methods are widely used in practice due to their scalability and effectiveness [13,14,15]. To alleviate the complexity issue, teacher/student learning-based methods use the posterior of the teacher model as a target distribution, but this method is a little complicated for incorporating external knowledge To handle these issues, we propose policy gradient method-based semi-supervised acoustic model training.

Statistical Speech Recognition

BLSTM-Based Acoustic Model

Related Work

Cross Entropy Loss

Gradient of the Unlabeled Data

Considerations on Low-Resource Domain

Semi-Supervised Acoustic Model Training Using Policy Gradient

Policy Gradient

Relation between Gradients of Cross Entropy Loss and Reward Loss

Semi-Supervised Learning Using Policy Gradient

Fine-tuning

Non-Native Korean Database

Alignment for the Human Labeled Corpus

BLSTM Training

Experimental Results

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 20, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Semi-supervised training strategies for deep neural networks
Matthew Gibson ... Puming Zhan
-
Matthew Gibson, et. al.Matthew Gibson ... Puming Zhan
01 Dec 2017
01 Dec 2017

Lithuanian Broadcast Speech Transcription Using Semi-supervised Acoustic Model Training
Rasa Lileikytė ... Thiago Fraga-Silva
Procedia Computer Science | VOL. 81
Rasa Lileikytė, et. al.Rasa Lileikytė ... Thiago Fraga-Silva
01 Jan 2015
Procedia Computer Science | VOL. 81

Semi-supervised GMM and DNN acoustic model training with multi-system combination and confidence re-calibration
Yan Huang ... Chaojun Liu
-
Yan Huang, et. al.Yan Huang ... Chaojun Liu
25 Aug 2013
25 Aug 2013

Investigations on ensemble based semi-supervised acoustic model training
Rong Zhang ... Arthur Chan
-
Rong Zhang, et. al.Rong Zhang ... Arthur Chan
04 Sep 2005
04 Sep 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences