Abstract
In our previous work, we introduced our attention-based speaker adaptation method, which has been proved to be an efficient online speaker adaptation method for real-time speech recognition. In this paper, we present a more complete framework of this method named memory-aware networks, which consists of the main network, the memory module, the attention module and the connection module. A gate mechanism and a multiple-connections strategy are presented to connect the memory with the main network in order to take full advantage of the memory. An auxiliary speaker classification task is provided to improve the accuracy of the attention module. The fixed-size ordinally forgetting encoding method is used together with average pooling to gather both short-term and long-term information. Furthermore, instead of only using traditional speaker embeddings such as i-vectors or d-vectors as the memory, we design a new form of memory called residual vectors, which can represent different pronunciation habits. Experiments on both the Switchboard and AISHELL-2 tasks show that our method can perform online speaker adaptation very well with no additional adaptation data and with only a relative 3% increase in decoding computation complexity. Under the cross-entropy criterion, our method achieves a relative word error rate reduction of 9.4% and 8.3% compared to that of the speaker-independent model on the Switchboard task and the AISHELL-2 task, respectively, and approximately 7.0% compared to that of the traditional d-vector-based speaker adaptation method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.