Abstract

Prior approaches for multimodal sentiment and emotion recognition (SER) exploit input data representations and neural networks based on the classical Euclidean geometry. Recently, however, the hyperbolic metric proved to be a powerful tool for data mapping, being able to capture the hierarchical structure of the relations among elements in the data. In this paper we propose the use of hyperbolic learning for SER, and show that the inclusion in the neural network of hyperbolic structures mapping the input into the hyperbolic space can improve the quality of the predictions. The benefits brought by the hyperbolic features are evaluated by developing extensions of existing methods following two approaches. From one side, we modified state-of-the-art models by including hyperbolic output layers. From the other, we generated hybrid neural network architectures by combining hyperbolic and Euclidean layers according to different schemes. The proposed hyperbolic models were tested on several classification tasks applied to benchmark multimodal SER datasets. Experiments gave strong evidence that in both simple and complex networks the introduction of a hyperbolic structure results in an improvement of the model accuracy. Specifically, the combined use of hyperbolic and Euclidean layers showed superior performance in almost all the classification tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call