Behavior analysis involves the detailed process of identifying, modeling, and comprehending the various nuances and patterns of emotional expressions exhibited by individuals. It poses a significant challenge to accurately detect and predict facial emotions, especially in contexts like remote interviews, which have become increasingly prevalent. Notably, many participants struggle to convey their thoughts to interviewers with a happy expression and good posture, which may unfairly diminish their chances of employment, despite their qualifications. To address this challenge, artificial intelligence techniques such as image classification offer promising solutions. By leveraging AI models, behavior analysis can be applied to perceive and interpret facial reactions, thereby paving the way to anticipate future behaviors based on learned patterns to the participants. Despite existing works on facial emotion recognition (FER) using image classification, there is limited research focused on platforms like remote interviews and online courses. In this paper, our primary focus lies on emotions such as happiness, sadness, anger, surprise, eye contact, neutrality, smile, confusion, and stooped posture. We have curated our dataset, comprising a diverse range of sample interviews captured through participants' video recordings and other images documenting facial expressions and speech during interviews. Additionally, we have integrated existing datasets such as FER 2013 and the Celebrity Emotions dataset. Through our investigation, we explore a variety of AI and deep learning methodologies, including VGG19, ResNet50V2, ResNet152V2, Inception-ResNetV2, Xception, EfficientNet B0, and YOLO V8 to analyze facial patterns and predict emotions. Our results demonstrate an accuracy of 73% using the YOLO v8 model. However, we discovered that the categories of happy and smile, as well as surprised and confused, are not disjoint, leading to potential inaccuracies in classification. Furthermore, we considered stooped posture as a non-essential class since the interviews are conducted via webcam, which does not allow for the observation of posture. By removing these overlapping categories, we achieved a remarkable accuracy increase to around 76.88% using the YOLO v8 model.
Read full abstract