Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting

Eva Sharma,Ed Lin,Jian Wu,Lei He,Rui Zhao,Wenning Wei,Yao Tian,Yifan Gong,Guoli Ye

doi:10.1109/icassp40776.2020.9053191

Abstract

With the advent of recurrent neural network transducer (RNN-T) model, the performance of keyword spotting (KWS) systems has greatly improved. However, the KWS systems, employed for wake-word detection, still rely on the availability of keyword specific training data for achieving reasonable performance on each keyword. With a goal to improve the KWS performance for these keywords without having to collect additional natural speech data, we explore Text-To-Speech (TTS) technology to synthetically generate training data for such keywords. Employing an RNN-T based KWS model, already well trained on large keyword-independent natural speech dataset, as a seed model, we run adaptation experiments using the generated keyword-specific TTS data. Besides observing a considerable improvement in the overall performance for the low-resource keywords, we find that the performance improvement with TTS-generated training data, similar to natural speech data, depends on speaker diversity, amount of data per speaker and data simulation. We get additional improvement in performance by selectively adapting specific parts of the RNN-T model and gain key insights into different architectural constructs of RNN-T model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Keyword Spotting using Vowel Onset Point, Vector Quantization and Hidden Markov Modeling Based techniques
B V Sandeep Reddy ... S R Mahadeva Prasanna
-
B V Sandeep Reddy, et. al.B V Sandeep Reddy ... S R Mahadeva Prasanna
01 Nov 2008
01 Nov 2008

Developing STT and KWS systems using limited language resources
Viet-Bac Le ... Jean-Luc Gauvain
-
Viet-Bac Le, et. al.Viet-Bac Le ... Jean-Luc Gauvain
14 Sep 2014
14 Sep 2014

An End-to-End Far-Field Keyword Spotting System with Neural Beamforming
Xuan Ji ... Ming Liu
-
Xuan Ji, et. al.Xuan Ji ... Ming Liu
13 Dec 2021
13 Dec 2021

Different confidence measures for word verification in speech recognition
M.C Benı́Tez ... A De La Torre
Speech Communication | VOL. 32
M.C Benı́Tez, et. al.M.C Benı́Tez ... A De La Torre
14 Aug 2000
Speech Communication | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting

Abstract

Talk to us

Similar Papers