Abstract

Voice control is an important function in many mobile devices, in a smart home, especially in providing people with disabilities a convenient way to communicate with the device. Despite many studies on this problem in the world, there has not been a formal study for the Vietnamese language. In addition, many studies did not offer a solution that can be expanded easily in the future. During this study, a dataset of Vietnamese speech commands is labeled and organized to be shared with community of general language research and Vietnamese language study in particular. This paper provides a speech collection and processing software. This study also designs and evaluates Recurrent Neural Networks to apply it to the data collected. The average recognition accuracy on the set of 15 commands for controlling smart home devices is 98.19%.

Highlights

  • Interaction and control of household devices is a fast trend, evident in the exponentially growing number of smart-homes

  • The results show that a throat microphone is robust in noisy environment, achieving a 95.4% hit rate in a speech recognition system with multiple Neural Networks (NNs) using the oneagainst-all approach, while a simple NN could only reach 91.88%

  • In “End-to-End Speech Command Recognition with Capsule Network” [8], Jaesung Bae, Dae-Shik Kim realize that Convolutional Neural Networks (CNNs) are capable of capturing the local features effectively

Read more

Summary

INTRODUCTION

Interaction and control of household devices is a fast trend, evident in the exponentially growing number of smart-homes. In “Binary Neural Networks for Classification of Voice Commands from Throat Microphone” [2], the authors uses binary classifiers and Neural Networks (NNs), together with a perceptual linear prediction method for feature extraction to increase the classification rate of voice commands captured using a throat microphone, comparing this method with a single NN They create a dataset of 150 people (men and women). In “End-to-End Speech Command Recognition with Capsule Network” [8], Jaesung Bae, Dae-Shik Kim realize that CNNs are capable of capturing the local features effectively. They can be used for tasks which have relatively short-term dependencies, such as keyword spotting or phoneme-level sequence recognition.

14 Đóng cổng
Data Collection Software
Data Organization
Data Processing
Proposed Architecture
Implementation of the Neural Network
Experiments and Model Analyzing
APPLY TO RECOGNITION OF VIETNAMESE SPEECH COMMANDS
CONCLUSION AND PERSPECTIVES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call