Keyword Spotting Research Articles

This paper proposes a new approach for keyword spotting, which is based on large margin and kernel methods rather than on HMMs. Unlike previous approaches, the proposed method employs a discriminative learning procedure, in which the learning phase aims at achieving a high area under the ROC curve, as this quantity is the most common measure to evaluate keyword spotters. The keyword spotter we devise is based on mapping the input acoustic representation of the speech utterance along with the target keyword into a vector-space. Building on techniques used for large margin and kernel methods for predicting whole sequences, our keyword spotter distills to a classifier in this vector-space, which separates speech utterances in which the keyword is uttered from speech utterances in which the keyword is not uttered. We describe a simple iterative algorithm for training the keyword spotter and discuss its formal properties, showing theoretically that it attains high area under the ROC curve. Experiments on read speech with the TIMIT corpus show that the resulted discriminative system outperforms the conventional context-independent HMM-based system. Further experiments using the TIMIT trained model, but tested on both read (HTIMIT, WSJ) and spontaneous speech (OGI Stories), show that without further training or adaptation to the new corpus our discriminative system outperforms the conventional context-independent HMM-based system.

Read full abstract

The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech and used to find matches. Lattice search (referred to as spoken term detection), uses a pre-indexing of speech data in terms of word or sub-word units, which can then quickly be searched for arbitrary terms without referring to the original audio. In both cases, the search term can be modelled in terms of sub-word units, typically phonemes. For in-vocabulary words (i.e. words that appear in the pronunciation dictionary), the letter-to-sound conversion systems are accepted to work well. However, for out-of-vocabulary (OOV) search terms, letter-to-sound conversion must be used to generate a pronunciation for the search term. This is usually a hard decision (i.e. not probabilistic and with no possibility of backtracking), and errors introduced at this step are difficult to recover from. We therefore propose the direct use of graphemes (i.e., letter-based sub-word units) for acoustic modelling. This is expected to work particularly well in languages such as Spanish, where despite the letter-to-sound mapping being very regular, the correspondence is not one-to-one, and there will be benefits from avoiding hard decisions at early stages of processing. In this article, we compare three approaches for Spanish keyword spotting or spoken term detection, and within each of these we compare acoustic modelling based on phone and grapheme units. Experiments were performed using the Spanish geographical-domain A lbayzin corpus. Results achieved in the two approaches proposed for spoken term detection show us that trigrapheme units for acoustic modelling match or exceed the performance of phone-based acoustic models. In the method proposed for keyword spotting, the results achieved with each acoustic model are very similar.

Read full abstract

Keyword Spotting Research Articles

Related Topics

Articles published on Keyword Spotting

Text-independent pronunciation quality automatic assessment system for English retelling test

Beam Pruning Based on Quantile for Keyword Spotting

A Hybrid Model for Automatic Emotion Recognition in Suicide Notes

Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting

Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

A Novel Word Spotting Method Based on Recurrent Neural Networks

Performance evaluation for an HMM-based keyword spotter and a large-margin based one in noisy environments

A survey of keyword spotting techniques for printed document images

SACA: Software Assisted Call Analysis – An interactive tool supporting content exploration, online guidance and quality improvement of counseling dialogues

Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework

Phoneme lattice construction and its application to speech recognition and keyword spotting

Keyword Spotting in Multilingual Environments

Estimation of Phone Mismatch Penalty Matrices for Two-Stage Keyword Spotting

On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues

Point Process Models for Spotting Keywords in Continuous Speech

Discriminative keyword spotting

Point process models of distinctive feature landmarks for speech recognition.

Modelling of the cochlea response as a versatile tool for acoustic signal processing

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

Telephone number retrieval system and method

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Keyword Spotting Research Articles

Related Topics

Articles published on Keyword Spotting

Text-independent pronunciation quality automatic assessment system for English retelling test

Beam Pruning Based on Quantile for Keyword Spotting

A Hybrid Model for Automatic Emotion Recognition in Suicide Notes

Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting

Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

A Novel Word Spotting Method Based on Recurrent Neural Networks

Performance evaluation for an HMM-based keyword spotter and a large-margin based one in noisy environments

A survey of keyword spotting techniques for printed document images

SACA: Software Assisted Call Analysis – An interactive tool supporting content exploration, online guidance and quality improvement of counseling dialogues

Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework

Phoneme lattice construction and its application to speech recognition and keyword spotting

Keyword Spotting in Multilingual Environments

Estimation of Phone Mismatch Penalty Matrices for Two-Stage Keyword Spotting

On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues

Point Process Models for Spotting Keywords in Continuous Speech

Discriminative keyword spotting

Point process models of distinctive feature landmarks for speech recognition.

Modelling of the cochlea response as a versatile tool for acoustic signal processing

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

Telephone number retrieval system and method