Abstract
This paper enhanced human-robot interaction (HRI) towards interacting with humans through a speech-emotion recognition algorithm based on the improved ShuffleNet V2 network as the means of communication. When recognizing speech emotion, previous studies typically analyze speech signals solely from the time or frequency domain, resulting in the loss of detailed information. Our algorithm initially converts speech data into an acoustic spectrogram to address this limitation, preserving rich original information across both time and frequency domains. This approach optimizes feature extraction by leveraging the capabilities of deep convolutional networks to capture local changes in images. Furthermore, we integrated the attention mechanism from ECA-Net into the ShuffleNet V2 model. This incorporation enhances the extraction of significant features, amplifying the expression of features strongly correlated with the emotional state conveyed in speech while suppressing irrelevant ones. In addition to these enhancements, we replace the ReLU activation function with Hardswish to deepen the neural network, thereby improving feature information without compromising model accuracy. Finally, the training stage incorporates the hot restart Cosine Annealing Learning Rate to optimize the network further. Simulation experiments are conducted on the enterface'05 dataset to evaluate the accuracy of the proposed algorithm. The experimental results show that the recognition rate of the method is higher than that of the existing algorithm in terms of the effectiveness of HRI.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.