Keyword Spotting System Research Articles

We consider feature learning for a computationally efficient method of keyword spotting that can be applied in severely under-resourced settings. The objective is to support humanitarian relief programmes by the United Nations (UN) in parts of Africa in which almost no language resources are available. To allow a keyword spotting system to be rapidly developed in such a language, we rely on a small and easily-compiled set of isolated keywords. Using the isolated keywords as templates, we apply dynamic time warping (DTW) to a much larger corpus of in-domain but untranscribed speech. The resulting DTW alignment scores are used to train a convolutional neural network (CNN) which is orders of magnitude more computationally efficient than DTW and therefore suitable for real-time application. We optimise this ASR-free neural network keyword spotting procedure by identifying acoustic features that provide robust performance in this almost zero-resource setting. First, we consider the benefits of incorporating information from well-resourced but unrelated languages by incorporating a multilingual bottleneck feature (BNF) extractor. Next, we consider using features extracted from an autoencoder (AE) trained on in-domain but untranscribed data. Finally, we consider features obtained from a correspondence autoencoder (CAE) which is initialised with the AE and subsequently fine-tuned on the small set of in-domain labelled data. Experiments in South African English and Luganda, a low-resource language, demonstrate that, on their own, both the BNF and CAE features can achieve a 5% relative performance improvement over baseline MFCCs. However, by using BNFs as input to the CAE, even better performance is achieved, resulting in a more than 27% relative improvement over MFCCs in ROC area-under-the-curve (AUC) and more than twice as many top-10 retrievals. We also show that, using these features, the CNN-DTW keyword spotter performs almost as well as the DTW keyword spotter while comfortably outperforming a baseline CNN trained only on the keyword templates. We conclude that a CNN-DTW keyword spotter using BNF-derived CAE features represents a computationally efficient approach with very competitive performance that is suited to rapid deployment in a severely under-resourced scenario.

Read full abstract

The keyword spotting (KWS) system is one of the most important interfaces between humans and machines since it is usually the start of automatic speech recognition and natural language processing techniques. However, for KWS hardware, it is still a problem to make one specified chip both low power and high performed under multiple scenarios, such as in meeting rooms, on different traffic or in parks and so on, for different scenarios own wide range signal-noise-ratios (SNRs). The problem leads to the requirements of balanced design between KWS system accuracy and the hardware cost under various noise types and levels. To overcome the balanced design and tradeoff problems, a complete KWS processor including an Mel-Frequency Cepstrum Coefficients (MFCC) feature extractor and a quantized Convolutional Neural Network (QCNN) accelerator is proposed for wide SNR range and low-power KWS in this paper. Firstly, the approach to quantize CNNs into QCNNs with high accuracy is proposed with considerations of hardware-software tradeoff. With the tradeoff of KWS system accuracy and hardware cost, the 4bit/8bit dual-working-mode strategy is proposed to keep low hardware cost and high accuracy under different scenarios. To be specific, the training, tuning and validating of the CNNs and QCNNs are taken with the dataset of 10 keywords chosen from the Google Command Speech Dataset (GCSD). Secondly, a serial FFT based MFCC extractor is implemented with low power and small footprint. Finally, with a novel hybrid reuse strategy of input data and network weight, a reconfigurable and approximate computing based QCNN accelerator is designed. Implemented and verified under TSMC 22nm ULL technology, with the area of 1.42mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , the QCNN accelerator can achieve 5.26μW/9.08μW power consumption in 4bit/8bit work mode with accuracy of 88% and 93% respectively, which is superior to the state-of-the-art processors.

Read full abstract

Keyword Spotting System Research Articles

Related Topics

Articles published on Keyword Spotting System

Deep Spoken Keyword Spotting: An Overview

Generalisation Gap of Keyword Spotters in a Cross-Speaker Low-Resource Scenario.

Neural keyword confidence estimation for open‐vocabulary keyword spotting

Hardware Acceleration for Embedded Keyword Spotting: Tutorial and Survey

Fast speech adversarial example generation for keyword spotting system with conditional GAN

Feature learning for efficient ASR-free keyword spotting in low-resource languages

Efficient Keyword Spotting System Using Deformable Convolutional Network

NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting

A Model for Evaluating the Performance of a Multiple Keywords Spotting System for the Transcription of Historical Handwritten Documents.

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document

AN EVALUATION OF SOME FACTORS AFFECTING ACCURACY OF THE VIETNAMESE KEYWORD SPOTTING SYSTEM

Isolated Keyword Spotting in Multilingual Environment using ANN and MFCC

Isolated Keyword Spotting in Multilingual Environment using ANN and MFCC

A multimodel keyword spotting system based on lip movement and speech features

Using keyword spotting systems as tools for the transcription of historical handwritten documents: Models and procedures for performance evaluation

QCNN Inspired Reconfigurable Keyword Spotting Processor With Hybrid Data-Weight Reuse Methods

Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Keyword Spotting System Research Articles

Related Topics

Articles published on Keyword Spotting System

Deep Spoken Keyword Spotting: An Overview

Generalisation Gap of Keyword Spotters in a Cross-Speaker Low-Resource Scenario.

Neural keyword confidence estimation for open‐vocabulary keyword spotting

Hardware Acceleration for Embedded Keyword Spotting: Tutorial and Survey

Fast speech adversarial example generation for keyword spotting system with conditional GAN

Feature learning for efficient ASR-free keyword spotting in low-resource languages

Efficient Keyword Spotting System Using Deformable Convolutional Network

NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting

A Model for Evaluating the Performance of a Multiple Keywords Spotting System for the Transcription of Historical Handwritten Documents.

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document

AN EVALUATION OF SOME FACTORS AFFECTING ACCURACY OF THE VIETNAMESE KEYWORD SPOTTING SYSTEM

Isolated Keyword Spotting in Multilingual Environment using ANN and MFCC

Isolated Keyword Spotting in Multilingual Environment using ANN and MFCC

A multimodel keyword spotting system based on lip movement and speech features

Using keyword spotting systems as tools for the transcription of historical handwritten documents: Models and procedures for performance evaluation

QCNN Inspired Reconfigurable Keyword Spotting Processor With Hybrid Data-Weight Reuse Methods

Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices