In recent years, facial emotion recognition has gained significant improvement and attention. This technology utilizes advanced algorithms to analyze facial expressions, enabling computers to detect and interpret human emotions accurately. Its applications span over a wide range of fields, from improving customer service through sentiment analysis, to enhancing mental health support by monitoring emotional states. However, there are several challenges in facial emotion recognition, including variability in individual expressions, cultural differences in emotion display, and privacy concerns related to data collection and usage. Lighting conditions, occlusions, and the need for diverse datasets also impacts accuracy. To solve these issues, an enhanced multi-verse optimizer (EMVO) technique is proposed to improve the efficiency of recognizing emotions. Moreover, EMVO is used to improve the convergence speed, exploration-exploitation balance, solution quality, and the applicability in different types of optimization problems. Two datasets were used to collect the data, namely YouTube and surrey audio-visual expressed emotion (SAVEE) datasets. Then, the classification is done using the convolutional neural networks (CNN) to improve the performance of emotion recognition. When compared to the existing methods shuffled frog leaping algorithm-incremental wrapper-based subset selection (SFLA-IWSS), hierarchical deep neural network (H-DNN) and unique preference learning (UPL), the proposed method achieved better accuracies, measured at 98.65% and 98.76% on the YouTube and SAVEE datasets, respectively.