Paralinguistic Cues in Speech to Adapt Robot Behavior in Human-Robot Interaction

Ashita Ashok,Zuhair Zafar,Jakub Pawlak,Sarwar Paplu,Karsten Berns

doi:10.1109/biorob52689.2022.9925505

Abstract

In human-human interactions, speakers communicate their emotional state to the listeners using a combination of linguistic and paralinguistic cues. Although comprehension of paralinguistic cues plays a substantial role in improved interaction, limited works have explored this concerning a social robot. This work presents a speech emotion recognition system based on paralinguistic cues evaluated by emotion cards and emotional storytelling. An SVM classification model was trained to recognize 8 emotions (Neutral, Calm, Happy, Sad, Angry, Fearful, Disgust, Surprise) in a robot-directed speech exclusively based on vocalizations. Three simulated speech emotion datasets, RAVDESS, TESS, and RRLabSED contributed to 5972 training samples. Feature extraction and feature selection were carried out using the openSMILE toolkit and reliefF algorithm, respectively. To validate the system using real-time data, speech emotion samples from 15 participants were gathered through two tasks (Task 1 and Task 2). Experimental results report a balanced accuracy score of 66.43% and 69.63% for prediction of speech emotion, respectively. Finally, an HRI scenario was realized where the robot adapted its behavior as per the recognized human speech emotion. The ratings on Robot's Perceived Empathy scale, support the robot depicted as capable of empathy.

Full Text