Educational Data Mining to Support Programming Learning Using Problem-Solving Data

Md. Mostafizer Rahman,Yutaka Watanobe,Taku Matsumoto,Rage Uday Kiran,Keita Nakamura

doi:10.1109/access.2022.3157288

Md. Mostafizer Rahman, Yutaka Watanobe + Show 3 more

Open Access

https://doi.org/10.1109/access.2022.3157288

Copy DOI

Abstract

Computer programming has attracted a lot of attention in the development of information and communication technologies in the real world. Meeting the growing demand for highly skilled programmers in the ICT industry is one of the major challenges. In this point, online judge (OJ) systems enhance programming learning and practice opportunities in addition to classroom-based learning. Consequently, OJ systems have created a large number of problem-solving data (solution codes, logs, and scores) archives that can be valuable raw materials for programming education research. In this paper, we propose an educational data mining framework to support programming learning using unsupervised algorithms. The framework includes the following sequence of steps: (<inline-formula> <tex-math notation="LaTeX">$i$ </tex-math></inline-formula>) problem-solving data collection (logs and scores are collected from the OJ) and preprocessing; (<inline-formula> <tex-math notation="LaTeX">$ii$ </tex-math></inline-formula>) MK-means clustering algorithm is used for data clustering in Euclidean space; (<inline-formula> <tex-math notation="LaTeX">$iii$ </tex-math></inline-formula>) statistical features are extracted from each cluster; (<inline-formula> <tex-math notation="LaTeX">$iv$ </tex-math></inline-formula>) frequent pattern (FP)-growth algorithm is applied to each cluster to mine data patterns and association rules; (<inline-formula> <tex-math notation="LaTeX">$v$ </tex-math></inline-formula>) a set of suggestions are provided on the basis of the extracted features, data patterns, and rules. Different parameters are adjusted to achieve the best results for clustering and association rule mining algorithms. For the experiment, approximately 70,000 real-world problem-solving data from 537 students of a programming course (Algorithm and Data Structures) were used. In addition, synthetic data have leveraged for experiments to demonstrate the performance of MK-means algorithm. The experimental results show that the proposed framework effectively extracts useful features, patterns, and rules from problem-solving data. Moreover, these extracted features, patterns, and rules highlight the weaknesses and the scope of possible improvements in programming learning.

Highlights

T ODAY’S information and communication technology (ICT) industry demands for highly skilled programmers for further development
LITERATURE we present some recent research works that are related to Educational data mining (EDM), rule-based recommender systems (RSs), clustering techniques, and data pattern and association rule mining (ARM) techniques
Optimal initial center selection, and outlier handling, we proposed a clustering algorithm based on K-means perception in a study [46], called the modified K-means (MK-means) clustering algorithm

Summary

Introduction

T ODAY’S information and communication technology (ICT) industry demands for highly skilled programmers for further development. The conventional computer programming learning environment is insufficient to prepare highly skilled programmers due to the limited number of exercise classes, limited practice opportunities, and lack of individual tutoring. Most educational institutions, such as schools, colleges, and universities are struggling to build more educational facilities to increase academic activity (e.g., additional exercise classes, practice, and individual tutoring) due to logistical and organizational constraints [1]. A. EDUCATIONAL DATA MINING In the last few years, e-learning platforms have become more popular for a variety of reasons and demands, including teacher shortage, unbalanced student-teacher ratio, logistical and infrastructure constraints, high cost of technical and professional courses, dissemination of education to a large number of people, time saving and easy access to many courses [30]. A deep neural network model was trained on the basis of SCFH to classify students into three main groups: “Risky”, “Intermediate”, and “Advanced”

Objectives

Results

Discussion

Conclusion