Mental illness has grown to become a prevalent and global health concern that affects individuals across various demographics. Timely detection and accurate diagnosis of mental disorders are crucial for effective treatment and support as late diagnosis could result in suicidal, harmful behaviors and ultimately death. To this end, the present study introduces a novel pipeline for the analysis of facial expressions, leveraging both the AffectNet and 2013 Facial Emotion Recognition (FER) datasets. Consequently, this research goes beyond traditional diagnostic methods by contributing a system capable of generating a comprehensive mental disorder dataset and concurrently predicting mental disorders based on facial emotional cues. Particularly, we introduce a hybrid architecture for mental disorder detection leveraging the state-of-the-art object detection algorithm, YOLOv8 to detect and classify visual cues associated with specific mental disorders. To achieve accurate predictions, an integrated learning architecture based on the fusion of Convolution Neural Networks (CNNs) and Visual Transformer (ViT) models is developed to form an ensemble classifier that predicts the presence of mental illness (e.g., depression, anxiety, and other mental disorder). The overall accuracy is improved to about 81% using the proposed ensemble technique. To ensure transparency and interpretability, we integrate techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) and saliency maps to highlight the regions in the input image that significantly contribute to the model's predictions thus providing healthcare professionals with a clear understanding of the features influencing the system's decisions thereby enhancing trust and more informed diagnostic process.
Read full abstract