Edge Container for Speech Recognition

Lukáš Beňo,Rudolf Pribiš,Peter Drahoš

doi:10.3390/electronics10192420

Lukáš Beňo, Rudolf Pribiš + Show 1 more

Open Access

https://doi.org/10.3390/electronics10192420

Copy DOI

Journal: Electronics	Publication Date: Oct 4, 2021
Citations: 4	License type: CC BY 4.0

Affiliation: Slovak University of Technology in Bratislava

Abstract

Containerization has been mainly used in pure software solutions, but it is gradually finding its way into the industrial systems. This paper introduces the edge container with artificial intelligence for speech recognition, which performs the voice control function of the actuator as a part of the Human Machine Interface (HMI). This work proposes a procedure for creating voice-controlled applications with modern hardware and software resources. The created architecture integrates well-known digital technologies such as containerization, cloud, edge computing and a commercial voice processing tool. This methodology and architecture enable the actual speech recognition and the voice control on the edge device in the local network, rather than in the cloud, like the majority of recent solutions. The Linux containers are designed to run without any additional configuration and setup by the end user. A simple adaptation of voice commands via configuration file may be considered as an additional contribution of the work. The architecture was verified by experiments with running containers on different devices, such as PC, Tinker Board 2, Raspberry Pi 3 and 4. The proposed solution and the practical experiment show how a voice-controlled system can be created, easily managed and distributed to many devices around the world in a few seconds. All this can be achieved by simple downloading and running two types of ready-made containers without any complex installations. The result of this work is a proven stable (network-independent) solution with data protection and low latency.

Highlights

Alexa [1], Siri [2], Cortana [3] and Google Assistant [4] have shown the human– machine interface of the future
Many recent implementations of voice control are focused on functionality, which can be offered by the proposed architectures, but they are not focused on specific topics, such as the processing of data near its source, data privacy, easy deployment and management of the solution
The benefits and main goals of the proposed architecture in comparison to other implantations can be formulated as follows: 1. Stability and low latency—As described in Section 4, the response time is shorter when the processing is performed by the Internet of Things (IoT) edge device and not in the cloud Azure

Summary

Introduction

Alexa [1], Siri [2], Cortana [3] and Google Assistant [4] have shown the human– machine interface of the future. Cars, factories, smart devices and cities generate a big volume of data that consume the network, and in the near future, this load on the network could cause problems. Many recent implementations of voice control are focused on functionality, which can be offered by the proposed architectures, but they are not focused on specific topics, such as the processing of data near its source, data privacy, easy deployment and management of the solution. In a scientific article [11], authors use a conversation agent in the cloud, so every voice record must be sent remotely to a third-party provider. They are not dealing with the deployment question. The study deals with topics such as speech-to-text libraries, edge computing, containerization, Docker

Objectives

Results

Conclusion