EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing

Shunzhi Yang,Zheng Gong,Zheng Huang,Kai Ye,Zhenhua Huang,Yungen Wei

doi:10.1109/access.2020.2990974

Abstract

Driven by the vision of Internet of Things, some research efforts have already focused on designing a network of efficient speech recognition for the development of edge computing. Other researches (such as tpool2) do not make full use of spatial and temporal information in the acoustic features of speech. In this paper, we propose a compact speech recognition network with spatio-temporal features for edge computing, named EdgeRNN. Alternatively, EdgeRNN uses 1-Dimensional Convolutional Neural Network (1-D CNN) to process the overall spatial information of each frequency domain of the acoustic features. A Recurrent Neural Network (RNN) is used to process the temporal information of each frequency domain of the acoustic features. In addition, we propose a simplified attention mechanism to enhance the portion of the network that contributes to the final identification. The overall performance of EdgeRNN has been verified on speech emotion and keywords recognition. The IEMOCAP dataset is used in speech emotion recognition, and the unweighted average recall (UAR) reaches 63.98%. Speech keywords recognition uses Google's Speech Commands Datasets V1 with a weighted average recall (WAR) of 96.82%. Compared with the experimental results of the related efficient networks on Raspberry Pi 3B+, the accuracies of EdgeRNN have been improved on both of speech emotion and keywords recognition.

Highlights

According to the IHS Markit perspective [1], the number of Internet of Things (IoT) devices is expected to reach 125 billion by 2030. Those IoT has attracted lots of attention in the industry and academia because they can be widely used in many applications [2]. Because of their constrained resources [3], those micro-instruments are commonly named as edge computing devices
To solve the performance and accuracy problems of speech recognition on edge computing devices, we propose a compact Recurrent Neural Network (RNN) which is named EdgeRNN
EdgeRNN consists of 1-Dimensional Convolutional Neural Network (1-D Convolutional Neural Network (CNN)), RNN and attention mechanism, which is a very common network structure for speech recognition

Summary

INTRODUCTION

According to the IHS Markit perspective [1], the number of Internet of Things (IoT) devices is expected to reach 125 billion by 2030. A combination of 1-D CNN and RNN is required to design a speech recognition network model for edge computing devices. To solve the performance and accuracy problems of speech recognition on edge computing devices, we propose a compact RNN which is named EdgeRNN. EdgeRNN consists of 1-D CNN, RNN and attention mechanism, which is a very common network structure for speech recognition. It is the first to be used in speech recognition tasks for edge computing devices This is mainly because of the computations and parameters of 1-D CNN, RNN and attention mechanism. 2) The EdgeRNN model runs on the Raspberry Pi 3B+ can recognize and process 2 voices faster than the time taken to collect the speech This performance meets the practical requirements of speech recognition for edge computing.

RELATED WORK

DESIGN OF EdgeRNN The EdgeRNN model is divided into the following parts

TIME INFORMATION EXTRACTION LAYER

SELF-ATTENTION MECHANISM LAYER AND CLASSIFICATION LAYER

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 47	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Speech Signal Imaging and Emotion Recognition Based on Symmetric-Diagonal Matrix Model
Zijun Yang ... Shi Zhou
-
Zijun Yang, et. al.Zijun Yang ... Shi Zhou
01 Jan 2023
01 Jan 2023

Children’s recognition of emotion in music and speech
Dianna Vidas ... Genevieve A Dingle
Music & Science | VOL. 1
Dianna Vidas, et. al.Dianna Vidas ... Genevieve A Dingle
01 Jan 2018
Music & Science | VOL. 1

Time Dependent ARMA for Automatic Recognition of Fear-Type Emotions in Speech
J C Vásquez-Correa ... J R Orozco-Arroyave
-
J C Vásquez-Correa, et. al.J C Vásquez-Correa ... J R Orozco-Arroyave
01 Jan 2015
01 Jan 2015

In-depth investigation of speech emotion recognition studies from past to present –The importance of emotion recognition from speech signal for AI–
Yeşim Ülgen Sönmez ... Asaf Varol
Expert Systems with Applications: X | VOL. 22
Yeşim Ülgen Sönmez, et. al.Yeşim Ülgen Sönmez ... Asaf Varol
11 Mar 2024
Expert Systems with Applications: X | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions