A multimodel keyword spotting system based on lip movement and speech features

Anand Handa,Rashi Agarwal,Narendra Kohli

doi:10.1007/s11042-020-08837-2

Abstract

The spoken keyword recognition and its localization are one of the fundamental aspects of speech recognition and known as keyword spotting. In automatic keyword spotting systems, the Lip-reading (LR) methods have a broader role when audio data is not present or has corrupted information. The available works from the literature have focussed on recognizing a limited number of words or phrases and require the cropped region of face or lip. Whereas the proposed model does not require the cropping of the video frames and it is recognition free. The proposed model is utilizing Convolutional Neural Networks and Long Short Term Memory networks to improve the overall performance. The model creates a 128-dimensional subspace to represent the feature vectors for speech signals and corresponding lip movements (focused viseme sequences). Thus the proposed model can tackle lip reading as an unconstrained natural speech signal in the video sequences. In the experiments, different standard datasets as LRW (Oxford-BBC), MIRACL-VC1, OuluVS, GRID, and CUAVE are used for the evaluation of the proposed model. The experiments also have a comparative analysis of the proposed model with current state-of-the-art methods for Lip-Reading task and keyword spotting task. The proposed model obtain excellent results for all datasets under consideration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A multimodel keyword spotting system based on lip movement and speech features

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Journal: Multimedia Tools and Applications	Publication Date: Apr 20, 2020
Citations: 7

Similar Papers

Different confidence measures for word verification in speech recognition
M.C Benı́Tez ... A De La Torre
Speech Communication | VOL. 32
M.C Benı́Tez, et. al.M.C Benı́Tez ... A De La Torre
14 Aug 2000
Speech Communication | VOL. 32

Keyword Spotting using Vowel Onset Point, Vector Quantization and Hidden Markov Modeling Based techniques
B V Sandeep Reddy ... S R Mahadeva Prasanna
-
B V Sandeep Reddy, et. al.B V Sandeep Reddy ... S R Mahadeva Prasanna
01 Nov 2008
01 Nov 2008

An End-to-End Far-Field Keyword Spotting System with Neural Beamforming
Xuan Ji ... Ming Liu
-
Xuan Ji, et. al.Xuan Ji ... Ming Liu
13 Dec 2021
13 Dec 2021

Developing STT and KWS systems using limited language resources
Viet-Bac Le ... Jean-Luc Gauvain
-
Viet-Bac Le, et. al.Viet-Bac Le ... Jean-Luc Gauvain
14 Sep 2014
14 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A multimodel keyword spotting system based on lip movement and speech features

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications