Abstract

Educational data mining is the method for extracting and discovering new knowledge from education data. As education data is often complex and imbalanced, it requires a data preprocessing step or learning algorithms in order to obtain accurate analysis and interpretation. Many studies emphasize on classification and clustering methods in order to get insight and comprehensive knowledge from education data. However, a small number of previous works exclusively focused on the preprocessing of education data, particularly on the topic of the imbalanced dataset. Therefore, this research objective is to enhance the accuracy of data classification in educational web usage data. Our study involves the application of synthetic minority over-sampling techniques (SMOTE) to preprocess the raw dataset from web usage data. The minority class is a group of the students who failed the examination and the majority class is the students who passed the examination. In our experiments, four synthetic minority over-sampling methods are applied, SMOTE, and its variants: Borderline-SMOTE1, Borderline-SMOTE2, and SVM-SMOTE, in order to balance the number of samples in the minority class. The experiments are evaluated by comparing the results from well-known classification methods that are Naive Bayesian, decision tree, and k-nearest neighbors. The study experiments with real-world datasets from education data. The results present that synthetic minority over-sampling methods are capable of improving the detection of the minority class and achieve improving classification performance on precision, recall, and F1-value.
 Ed

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call