Facial expression recognition is a crucial task in numerous applications, including human-computer interaction, mental health monitoring, and human behavior analysis. Previous studies have primarily focused on individual models or techniques for improving emotion classification accuracy. However, a comparative analysis of different neural network architectures' performance for facial expression recognition is lacking. The main objective of this study is to compare the performance of Convolutional Neural Network (CNN) and Residual Network (ResNet) on the Fer2013 dataset. The author aims to analyze their behavior during the initial training phases and identify the architectural advantages and challenges associated with each model. Both models are trained using the same experimental setup to ensure a fair comparison. The models are trained using the Fer2013 dataset. The author employs a standard protocol for data preprocessing and augmentation. Results show that CNN achieves an accuracy of around 0.5 in the initial stages of training, which is significantly higher than ResNet's accuracy of 0.25. However, as training progresses, ResNet may outperform CNN because it has a more complicated structure that can capture more complex patterns. CNN exhibits superior performance during the initial training stage of the Fer2013 dataset. This reason may lie behind the fact that CNN has a simpler structure which makes it more sensitive on the basic features of the data.