Emotion classification is the process of identifying human emotions. Implementing technology to help people with emotional classification is considered a relatively popular research field. Until now, most of the work has been done to automate the recognition of facial cues (e.g., expressions) from several modalities (e.g., image, video, audio, and text). Deep learning architecture such as Convolutional Neural Networks (CNN) demonstrates promising results for emotion recognition. The research aims to build a CNN model while improving accuracy and performance. Two models are proposed in the research with some hyperparameter tuning followed by two datasets and other existing architecture that will be used and compared with the proposed architecture. The two datasets used are Facial Expression Recognition 2013 (FER2013) and Extended Cohn-Kanade (CK+), both of which are commonly used datasets in FER. In addition, the proposed model is compared with the previous model using the same setting and dataset. The result shows that the proposed models with the CK+ dataset gain higher accuracy, while some models with the FER2013 dataset have lower accuracy compared to previous research. The model trained with the FER2013 dataset has lower accuracy because of overfitting. Meanwhile, the model trained with CK+ has no overfitting problem. The research mainly explores the CNN model due to limited resources and time.