Abstract

Emotion recognition, which is a part of affective computing, draws a lot of attention from researchers because of its broad applications. Unlike previous approaches with the aim to recognize humans' emotional state using facial expression, speech or gesture, some researchers see the potential of the contextual information from the scene. Hence, in addition to the employment of the main subject, the general background data is also considered as the complementary cues for emotion prediction. However, most of the existing works still have some limitations in deeply exploring the scene-level context. In this paper, to fully exploit the essences of context, we propose the emotional state prediction method based on visual relationship detection between the main target and the adjacent objects from the background. Specifically, we utilize both the spatial and semantic features of objects in the scene to calculate the influences of all context-related elements and their properties of impact (positive, negative, or neutral) on the main subject by a modified attention mechanism. After that, the model incorporates those features with scene context and body features of the target person to predict their emotional states. Our experimental results achieve state-of-the-art performance on the CAER-S dataset and competitive results on the EMOTIC benchmark.

Highlights

  • An integral part of human life is emotion which has a considerable effect on human knowledge, thinking, and decisionmaking

  • The information from the facial expressions is commonly the key to recognize the emotional states, the conclusion of human emotion could be affected by many other factors

  • PROPOSED METHOD In this part, we explain our framework for emotion recognition based on visual relationship detection in context

Read more

Summary

Introduction

An integral part of human life is emotion which has a considerable effect on human knowledge, thinking, and decisionmaking. Most of the previous works in this area adopts the modalities directly extracted from humans such as facial expressions, text, speech, and even physiological signals. Several studies have been exploring the background for emotion recognition in recent years; how to effectively utilize the context is still in progress. The majority of papers did not concentrate on what was happening within the context. They used the entire picture, depending a lot on the capability of the model to automatically extract the global information from the scene and feeding the features into the networks. Our goal is to effectively utilize the surrounding objects that helps the model interpret the perceived emotion of the target better.

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.