Abstract

AbstractThis paper presents a study on the use of multi-task neural networks (MTNs) for voice-based soft biometrics recognition, e.g., gender, age, and emotion, in social robots. MTNs enable efficient analysis of audio signals for various tasks on low-power embedded devices, thus eliminating the need for cloud-based solutions that introduce network latency. However, the strict dataset requirements for training limit the potential of MTNs, which are commonly used to optimize a single reference problem. In this paper, we propose three MTN architectures with varying accuracy-complexity trade-offs for voice-based soft biometrics recognition. In addition, we adopt a learnable voice representation, that allows to adapt the specific cognitive robotics application to the environmental conditions. We evaluate the performance of these models on standard large-scale benchmarks, and our results show that the proposed architectures outperform baseline models for most individual tasks. Furthermore, one of our proposed models achieves state-of-the-art performance on three out of four of the considered benchmarks. The experimental results demonstrate that the proposed MTNs have the potential for being part of effective and efficient voice-based soft biometrics recognition in social robots.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call