Deep Spoken Keyword Spotting: An Overview

Ivan Lopez-Espejo,Zheng-Hua Tan,John H L Hansen,Jesper Jensen

doi:10.1109/access.2021.3139508

Abstract

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

Highlights

I NTERACTING with machines via voice is not science fiction anymore
While QbE keyword spotting (KWS) based on Recurrent neural networks (RNNs) feature extraction — which is different from the approach outlined in Section II and requires a careful treatment of its specificities— is out of the scope of this paper, we have considered it pertinent to allude to it for the following twofold reason
We review and provide some criticism of the most common metrics considered in the field of KWS

Summary

Introduction

Speech technologies have become ubiquitous in nowadays society. A distinctive feature of voice assistants is that, in order to be used, they first have to be activated by means of a spoken wake-up word or keyword, thereby avoiding running far more computationally expensive automatic speech recognition (ASR) when it is not required [2]. Voice assistants deploy a technology called spoken keyword spotting —or keyword spotting— , which can be understood as a subproblem of ASR [3]. Keyword spotting (KWS) can be defined as the task of identifying keywords in audio streams comprising speech. Apart from activating voice assistants, KWS has plenty of applications such as speech data mining, audio indexing, phone call routing, etc. Apart from activating voice assistants, KWS has plenty of applications such as speech data mining, audio indexing, phone call routing, etc. [4]

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2022
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Spoken Keyword Spotting: An Overview

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Fearless Steps APOLLO: Challenges in keyword spotting and topic detection for naturalistic audio streams
Aditya Joglekar ... Ivan Lopez-Espejo
The Journal of The Acoustical Society of America | VOL. 153
Aditya Joglekar, et. al.Aditya Joglekar ... Ivan Lopez-Espejo
01 Mar 2023
The Journal of The Acoustical Society of America | VOL. 153

Different confidence measures for word verification in speech recognition
M.C Benı́Tez ... A De La Torre
Speech Communication | VOL. 32
M.C Benı́Tez, et. al.M.C Benı́Tez ... A De La Torre
14 Aug 2000
Speech Communication | VOL. 32

Keyword Spotting using Vowel Onset Point, Vector Quantization and Hidden Markov Modeling Based techniques
B V Sandeep Reddy ... S R Mahadeva Prasanna
-
B V Sandeep Reddy, et. al.B V Sandeep Reddy ... S R Mahadeva Prasanna
01 Nov 2008
01 Nov 2008

Sports Type Determination Based on Keyword Spotting
Li Lu ... Fengpei Ge
-
Li Lu, et. al.Li Lu ... Fengpei Ge
01 Oct 2009
01 Oct 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Spoken Keyword Spotting: An Overview

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions