Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition

Hiroshi Seki,Kazumasa Yamamoto,Tomoyosi Akiba,Seiichi Nakagawa

doi:10.1109/slt.2018.8639648

Abstract

Deep neural networks (DNN) have achieved significant success in the field of automatic speech recognition. Previously, we proposed a filterbank-incorporated DNN which takes power spectra as input features. This method has a function of VTLN (Vocal tract length normalization) and fMLLR (feature-space maximum likelihood linear regression). The filterbank layer can be implemented by using a small number of parameters and is optimized under a framework of backpropagation. Therefore, it is advantageous in adaptation under limited available data. In this paper, speaker adaptation is applied to the filterbank-incorporated DNN. By applying speaker adaptation using 15 utterances, the adapted model gave a 7.4% relative improvement in WER over the baseline DNN at a significance level of 0.005 on CSJ task. Adaptation of filterbank layer also showed better performance than the other adaptation methods; singular value decomposition (SVD) based adaptation and learning hidden unit contributions (LHUC).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

FMLLR Speaker Normalization With i-Vector: In Pseudo-FMLLR and Distillation Framework
Neethu Mariam Joy ... Srinivasan Umesh
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Neethu Mariam Joy, et. al.Neethu Mariam Joy ... Srinivasan Umesh
01 Apr 2018
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26

DNNs for unsupervised extraction of pseudo speaker-normalized features without explicit adaptation data
Neethu Mariam Joy ... S Umesh
Speech Communication | VOL. 92
Neethu Mariam Joy, et. al.Neethu Mariam Joy ... S Umesh
07 Jun 2017
Speech Communication | VOL. 92

Improved Speaker Adaptation by Combining I-vector and fMLLR with Deep Bottleneck Networks
Thai Son Nguyen ... Alex Waibel
-
Thai Son Nguyen, et. al.Thai Son Nguyen ... Alex Waibel
01 Jan 2017
01 Jan 2017

Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition
Shaofei Xue ... Qingfeng Liu
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Shaofei Xue, et. al. Shaofei Xue ... Qingfeng Liu
01 Dec 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition

Abstract

Talk to us

Similar Papers