Keyword Spotting using Time-Domain Features in a Temporal Convolutional Network

Emad A Ibrahim,Hamed Fatemi,Jose Pineda De Gyvez,Jos Huisken

doi:10.1109/dsd.2019.00053

Abstract

With the increasing demand on voice recognition services, more attention is paid to simpler algorithms that are capable to run locally on a hardware device. This paper demonstrates simpler speech features derived in the time-domain for Keyword Spotting (KWS). The features are considered as constrained lag autocorrelations computed on overlapped speech frames to form a 2D map. We refer to this as Multi-Frame Shifted Time Similarity (MFSTS). MFSTS performance is compared against the widely known Mel-Frequency Cepstral Coefficients (MFCC) that are computed in the frequency-domain. A Temporal Convolutional Network (TCN) is designed to classify keywords using both MFCC and MFSTS. This is done by employing an open source dataset from Google Brain, containing ~ 106000 files of one-second recorded words such as, 'Backward', 'Forward', 'Stop' etc. Initial findings show that MFSTS can be used for KWS tasks without visiting the frequency-domain. Our experimental results show that classification of the whole dataset (25 classes) based on MFCC and MFSTS are in a very good agreement. We compare the performance of the TCNbased classifier with other related work in the literature. The classification is performed using small memory footprint (~ 90 KB) and low compute power (~ 5 MOPs) per inference. The achieved classification accuracies are 93.4% using MFCC and 91.2% using MFSTS. Furthermore, a case study is provided for a single-keyword spotting task. The case study demonstrates how MFSTS can be used as a simple preprocessing scheme with small classifiers while achieving as high as 98% accuracy. The compute simplicity of MFSTS makes it attractive for low power KWS applications paving the way for resource-aware solutions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Keyword Spotting using Time-Domain Features in a Temporal Convolutional Network

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Speech Keyword Spotting Method Based on Swin-Transformer Model
Chengli Sun ... Qiaosheng Guo
International Journal of Computational Intelligence Systems | VOL. 17
Chengli Sun, et. al.Chengli Sun ... Qiaosheng Guo
27 Mar 2024
International Journal of Computational Intelligence Systems | VOL. 17

0.08mm2 128nW MFCC Engine for Ultra-low Power, Always-on Smart Sensing Applications
Yi Sheng Chong ... Vishnu P Nambiar
-
Yi Sheng Chong, et. al.Yi Sheng Chong ... Vishnu P Nambiar
28 May 2022
0.08mm2 128nW MFCC Engine for Ultra-low Power, Always-on Smart Sensing Applications
Yi Sheng Chong ... Vishnu P Nambiar

AAD-KWS: A Sub-μ W Keyword Spotting Chip With an Acoustic Activity Detector Embedded in MFCC and a Tunable Detection Window in 28-nm CMOS
Weiwei Shan ... Jun Yang
IEEE Journal of Solid-State Circuits | VOL. 58
Weiwei Shan, et. al.Weiwei Shan ... Jun Yang
01 Mar 2023
IEEE Journal of Solid-State Circuits | VOL. 58

Customized Wake-Up Word with Key Word Spotting using Convolutional Neural Network
Tsung-Han Tsai ... Ping-Cheng Hao
-
Tsung-Han Tsai, et. al.Tsung-Han Tsai ... Ping-Cheng Hao
06 Oct 2019
06 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Keyword Spotting using Time-Domain Features in a Temporal Convolutional Network

Abstract

Talk to us

Similar Papers