Abstract

In industrial environments, it is crucial to establish a strong collaboration between humans and robots to enhance productivity. However, the nature of the work demands that workers have the authority to provide specific instructions to the robots. The scientific community has extensively investigated these dual requirements, aiming to develop advanced systems capable of recognizing voice commands and implementing speaker authentication. Nevertheless, in the industrial context, these tasks should be executed simultaneously on low-cost and low-power embedded devices that can be mounted on board the robotic platform. To overcome this challenge, we propose a multi-task network for Speech-Command Recognition and Speaker Identification. Additionally, we employ the GradNorm adaptive algorithm to address the issue of task imbalance. To evaluate the proposed system, we introduce a new dataset, MIVIA-ISC, consisting of 20,857 samples uttered by 562 speakers for 31 distinct commands. Our approach significantly reduces the network size by 47% and its execution time by 48% compared to the commonly used methodology, which employs one network for each task. Furthermore, our approach demonstrates a significant improvement in the accuracy of the Speaker Identification task, achieving an 11% increase compared to the corresponding single-task network. Importantly, this enhancement is achieved without compromising the accuracy of the Speech-Command Recognition task, which experiences only a minimal 3% decrease in performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.