Abstract

The presented paper proposes a new speech command recognition model for novel engineering applications with limited resources. We built the proposed model with the help of a Convolutional Recurrent Neural Network (CRNN). The use of CRNN instead of Convolutional Neural Network (CNN) helps us to reduce the model parameters and memory requirement as per resource constraints. Furthermore, we insert transmute and curtailment layer between the layers of CRNN. By doing this we further reduce model parameters and float number of operations to half of the CRNN requirement. The proposed model is tested on Google’s speech command dataset. The obtained result shows that the proposed CRNN model requires 1/3 parameters as compared to the CNN model. The number of parameters of the CRNN model is further reduced by 45% and the float numbers of operations between 2% to 12 % in different recognition tasks. The recognition accuracy of the proposed model is 96% on Google’s speech command dataset, and on laboratory recording, its recognition accuracy is 89%.

Highlights

  • A natural language speech interaction system is beneficial for the user because there is no learning curve regarding operation of the system

  • The Convolutional Recurrent Neural Network (CRNN) is the combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) [29]

  • The training is done until convergence though the different composition of CRNN requires a different number of epochs

Read more

Summary

Introduction

A natural language speech interaction system is beneficial for the user because there is no learning curve regarding operation of the system. User-friendly natural language speech interaction systems are available in the market [26, 36]. For novel engineering applications where memory and computational resources are limited, the use of a broadband-based speech interaction system is costly. It compromises privacy, battery life [26] as well as it highly depends on external factors, for example, network quality [16], network speed [1], latency [27], network traffic [36], etc

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call