Analyzing Artificial Neural Networks and Dynamic Time Warping for spoken keyword recognition under transient noise conditions

Paulo Lopez-Meyer,Omesh Tickoo,Hector Cordourier-Maruri,Arturo Quinto-Martinez

doi:10.1109/icsenst.2015.7438406

Abstract

Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.

Full Text