A multi-task network for speaker and command recognition in industrial environments

Stefano Bini,Gennaro Percannella,Alessia Saggese,Mario Vento

doi:10.1016/j.patrec.2023.10.022

Abstract

In industrial environments, it is crucial to establish a strong collaboration between humans and robots to enhance productivity. However, the nature of the work demands that workers have the authority to provide specific instructions to the robots. The scientific community has extensively investigated these dual requirements, aiming to develop advanced systems capable of recognizing voice commands and implementing speaker authentication. Nevertheless, in the industrial context, these tasks should be executed simultaneously on low-cost and low-power embedded devices that can be mounted on board the robotic platform. To overcome this challenge, we propose a multi-task network for Speech-Command Recognition and Speaker Identification. Additionally, we employ the GradNorm adaptive algorithm to address the issue of task imbalance. To evaluate the proposed system, we introduce a new dataset, MIVIA-ISC, consisting of 20,857 samples uttered by 562 speakers for 31 distinct commands. Our approach significantly reduces the network size by 47% and its execution time by 48% compared to the commonly used methodology, which employs one network for each task. Furthermore, our approach demonstrates a significant improvement in the accuracy of the Speaker Identification task, achieving an 11% increase compared to the corresponding single-task network. Importantly, this enhancement is achieved without compromising the accuracy of the Speech-Command Recognition task, which experiences only a minimal 3% decrease in performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A multi-task network for speaker and command recognition in industrial environments

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Oct 28, 2023
Citations: 3

Similar Papers

Preparation of templates in speech command recognition by single- and double-channel scheme in background noise
V R Krasheninnikov ... A V Khvostov
Pattern Recognition and Image Analysis | VOL. 18
V R Krasheninnikov, et. al.V R Krasheninnikov ... A V Khvostov
01 Dec 2008
Pattern Recognition and Image Analysis | VOL. 18

Phoneme-by-Phoneme Speech Recognition as a Classification of Series on a Set of Sequences of Elements of Complex Objects Using an Improved Trie-Tree
Galina Dorokhina
Информатика и автоматизация | VOL. 23
Galina DorokhinaGalina Dorokhina
07 Nov 2024
Информатика и автоматизация | VOL. 23

Robust speech command recognition in challenging industrial environments
Stefano Bini ... Mario Vento
Computer Communications | VOL. 228
Stefano Bini, et. al.Stefano Bini ... Mario Vento
02 Sep 2024
Computer Communications | VOL. 228

Recognition of speech commands using a modified neural fuzzy network and an improved GA
K.F Leung ... H.K Lam
The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03. | VOL. 3
K.F Leung, et. al.K.F Leung ... H.K Lam
25 May 2003
The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03. | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A multi-task network for speaker and command recognition in industrial environments

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters