Expression is the main method for judging the emotional state and psychological condition of the human body, and the prediction of changes in facial expressions can effectively determine the mental health of a person, thus avoiding serious psychological or psychiatric disorders due to early negligence. From a computer vision perspective, most researchers have focused on studying facial expression analysis, and in some cases, body posture is also considered. However their performance is more limited under unconstrained natural conditions, which requires more information to be used in human emotion analysis. In this paper, we design an Adaptive Multi-End Fusion Attention Mechanism suitable for extracting human body information based on the deep learning framework, depending on human expressions, postures and the environment they are in and add it to an object detection model to obtain the information we need from different regions of the human body and face and features of different sizes and use fusion networks for feature fusion and classification, and from different test methods to confirm that this fusion approach for expression recognition and prediction is feasible. This model achieves an average accuracy of 34.51 % in the Emotic contextual expression recognition dataset.
Read full abstract