Abstract

ABSTRACT While voice communication of emotion has been researched for decades, the accuracy of automatic voice emotion recognition (AVER) is yet to improve. In particular, the intergenerational communication has been under-researched, as indicated by the lack of an emotion corpus on child–parent conversations. In this paper, we presented our work of applying Support-Vector Machines (SVMs), established machine learning models, to analyze 20 pairs of child–parent dialogues on everyday life scenarios. Among many issues facing the emerging work of AVER, we explored two critical ones: the methodological issue of optimising its performance against computational costs, and the conceptual issue on the state of emotionally neutral. We used the minimalistic/extended acoustic feature set extracted with OpenSMILE and a small/large set of annotated utterances for building models, and analyzed the prevalence of the class neutral. Results indicated that the bigger the combined sets, the better the training outcomes. Regardless, the classification models yielded modest average recall when applied to the child–parent data, indicating their low generalizability. Implications for improving AVER and its potential uses are drawn.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call