Abstract
This study aims to find out the impact of data augmentation and Synthetic Minority Over-sampling Techniques (SMOTE) implementation on the classification models performance using small and imbalanced dataset of student academic performance. The design of this study involved a comprehensive experiment by comparing four scenarios: 1) comparing classification models without both data augmentation and SMOTE, 2) models with data augmentation, 3) models with SMOTE, and 4) models with both data augmentation and SMOTE. The model's performances were each measured based on standard evaluation metrices such as accuracy, precision, recall, F1-score.To test the results validity, there were three classification algorithms implemented and evaluated for each scenario, that is, Random Forest, XGBoost, and AdaBoost. The finding of this study highlights the significant impact of data augmentation and SMOTE to the increase of classification models performance, particularly over the small and imbalanced dataset. Results showed that the implementation of both techniques simultaneously brought about the most significant increase in the evaluation metrices compared to the implementation of both techniques separately.The originality of this study lies in its comprehensive approach in comparing the effectiveness of data augmentation and SMOTE, as well as the use of student academic performance dataset, which is, a real case in the context of artificial intelligence. This finding gives a valuable insight to the researchers and practitioners in choosing appropriate techniques to handle small and imbalanced class datasets. This study is expected to make an important contribution to the more effective development of classification methodology in various domains.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have