Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition

Jia Pan,Jun Du,Genshun Wan,Zhongfu Ye

doi:10.1109/taslp.2020.2980372

Abstract

In our previous work, we introduced our attention-based speaker adaptation method, which has been proved to be an efficient online speaker adaptation method for real-time speech recognition. In this paper, we present a more complete framework of this method named memory-aware networks, which consists of the main network, the memory module, the attention module and the connection module. A gate mechanism and a multiple-connections strategy are presented to connect the memory with the main network in order to take full advantage of the memory. An auxiliary speaker classification task is provided to improve the accuracy of the attention module. The fixed-size ordinally forgetting encoding method is used together with average pooling to gather both short-term and long-term information. Furthermore, instead of only using traditional speaker embeddings such as i-vectors or d-vectors as the memory, we design a new form of memory called residual vectors, which can represent different pronunciation habits. Experiments on both the Switchboard and AISHELL-2 tasks show that our method can perform online speaker adaptation very well with no additional adaptation data and with only a relative 3% increase in decoding computation complexity. Under the cross-entropy criterion, our method achieves a relative word error rate reduction of 9.4% and 8.3% compared to that of the speaker-independent model on the Switchboard task and the AISHELL-2 task, respectively, and approximately 7.0% compared to that of the traditional d-vector-based speaker adaptation method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2020
Citations: 43

Similar Papers

Exploring Layer Trajectory LSTM with Depth Processing Units and Attention
Jinyu Li ... Yifan Gong
-
Jinyu Li, et. al.Jinyu Li ... Yifan Gong
01 Dec 2018
01 Dec 2018

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
Shaofei Xue ... Qingfeng Liu
Journal of Signal Processing Systems | VOL. 82
Shaofei Xue, et. al.Shaofei Xue ... Qingfeng Liu
10 Jun 2015
Journal of Signal Processing Systems | VOL. 82

Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation
Yan Huang ... Jinyu Li
-
Yan Huang, et. al.Yan Huang ... Jinyu Li
10 Apr 2020
10 Apr 2020

Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator
Yan Huang ... Yifan Gong
-
Yan Huang, et. al.Yan Huang ... Yifan Gong
25 Oct 2020
25 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing