Educational data mining (EDM) is the application of data mining in the educational field. EDM is used to classify, analyze, and predict the students’ academic performance, and students’ dropout rate, as well as instructors’performance in order to improve teaching–learning process. This review article discusses the detailed analysis of 142 research articles from publication year 2010-2020 downloaded from the research databases such as IEEE, Springer, ACM, and Elsevier. Also this review article contains the current happenings related to EDM in year 2021 and 2022. In this review article, the use of classification techniques and classification techniques along with other data mining techniques such as clustering algorithm, association rule algorithms, regression techniques and ensemble techniques in EDM are presented thoroughly. The comparative study is considered for Classification Techniques; Classification and Clustering Technique; Classification ans Association Rule Mining; Classification, Clustering and Association rule mining; Classification, Regression, and Clustering; and Classification, and Ensemble. Analysis in terms of Yearwise Number of Research Articles employing Classification Techniquein EDM; Classification with other Data Mining Technique used in EDM; classifier as per Weka Tool; Classification Techniques; Clustering Techniques; Association Rule Techniques; Selecting the best Classification Technique; Classification performance metric; software used in EDM; Sampling Period; size of dataset; and data mining tools are illustrated.From review of 142 research articles, it is noted that classification techniques are mostly used technique for analyzing students’ performance in EDM. Also classification technique along with clustering techniques are applied to predict the performance of students. It is found that Naïve Bays, Random Forest, Support vector machine and J48 are mostly considered classification techniques while in classification along with clustering techniques, K-means clustering algorithm is used with classification algorithms. The classification algorithms such as Naïve Bays, Random Forest and Support Vector Machine are noted to be the best classification algorithms after comparing various classification algorithms based on various performance parameters. Among various performance parameters, the parameters accuracy, precision, recall, f-measures and k-fold value found to be used by most of the research articles. Programming languages used to build the model in EDM for analyzing the students’ dataset from educational setting, are Java, R and Python programming languages while data mining tools considered to evaluate the performance of classification or clustering or association rule algorithms are Weka, and RapidMiner. Classification algorithms under the classifiers as per Weka tool such as Tree, Bays, Function and PMML classifier are applied in most of the research articles.In addition to comparative analysis and analysis based on various factors, research gaps are also identified and mentioned the same in this article. Future direction for researcher working in EDM related to building the model on the dataset obtained from educational setting to predict students’ performance are discussed so that work in EDM can be carried out to improve the teaching–learning process.
Read full abstract