The ability to perceive human facial emotions is an essential feature of various multi-modal applications, especially in the intelligent human-computer interaction (HCI) area. In recent decades, considerable efforts have been put into researching automatic facial emotion recognition (FER). However, most of the existing FER methods only focus on either basic emotions such as the seven/eight categories (e.g., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">happiness, anger</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">surprise</i> ) or abstract dimensions ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">valence, arousal, etc.</i> ), while neglecting the fruitful nature of emotion statements. In real-world scenarios, there is definitely a larger vocabulary for describing human's inner feelings as well as their reflection on facial expressions. In this work, we propose to address the semantic richness issue in the FER problem, with an emphasis on the granularity of the emotion concepts. Particularly, we take inspiration from former psycho-linguistic research, which conducted a prototypicality rating study and chose 135 emotion names from hundreds of English emotion terms. Based on the 135 emotion categories, we investigate the corresponding facial expressions by collecting a large-scale 135-class FER image dataset and propose a consequent facial emotion recognition framework. To demonstrate the accessibility of prompting FER research to a fine-grained level, we conduct extensive evaluations on the dataset credibility and the accompanying baseline classification model. The qualitative and quantitative results prove that the problem is meaningful and our solution is effective. To the best of our knowledge, this is the first work aimed at exploiting such a large semantic space for emotion representation in the FER problem.
Read full abstract