Predicting Exam Success Using Machine Learning: An Analysis of Learning Behaviors
The availability of learning activity data from online education platforms has created new opportunities to examine how student behaviors relate to academic outcomes. Within this context, educational data mining has been widely applied to analyze learning patterns and support performance prediction. This paper explores whether students' learning behaviors can be used to predict exam success in a Python learning environment. Exploratory data analysis is used to compare behavioral characteristics between students who passed and those who failed the exam. The prediction task is formulated as a binary classification problem using the variable passed exam. A support vector machine (SVM) classifier is applied to distinguish between pass and fail outcomes, and feature importance analysis is conducted to better understand the contribution of different learning behaviors. The results suggest that engagement-related variables, particularly study time and practice activities, are closely associated with exam success, while demographic features contribute relatively little to prediction performance. These findings are consistent with existing educational data mining research and demonstrate the value of machine learning methods for analyzing learning behavior data.
- Research Article
- 10.3791/69515
- Mar 17, 2026
- Journal of visualized experiments : JoVE
The booming development of online education has made online classrooms an important component of the education field. In-depth analysis of students' learning behavior in online teaching can help teachers optimize teaching strategies and provide personalized learning support for students. Therefore, to carry out an in-depth analysis of students' learning behavior, this study collects data from online teaching platforms and preprocesses it. Subsequently, this study constructs a multi-perspective fuzzy reasoning model covering three dimensions: curriculum, individual, and class, to comprehensively consider students' learning performance from different levels. This model processes uncertain information in learning behavior data through fuzzy sets and fuzzy rules, achieving a multidimensional evaluation of learning performance. An improved XGBoost algorithm is designed to classify students' comment emotions. This improved algorithm optimizes the hyperparameters of the XGBoost algorithm by improving the grey wolf optimization algorithm. The algorithm enhances the accuracy of emotion classification and further explores the emotional tendencies and attitude feedback behind their learning behavior. The results showed that from a curriculum perspective, the completion rate of course tasks 3 weeks before the exam was basically above 45%, which was much higher than the completion rate three weeks after the task was released (both lower than 18%). These results indicated that students were more inclined to complete tasks before the deadline and had obvious procrastination. The maximum accuracy of the improved classification algorithm was 98.78%, which was 8.57%, 7.55%, 6.38%, and 6.01% higher than the comparison model, and its average time consumption was 58 ms. The recall rates on negative, positive, and neutral emotions were 98.35%, 97.69%, and 98.02%. The research model can effectively analyze students' online learning behavior and enable early identification of at-risk students, facilitating personalized teaching and precise intervention in online education.
- Research Article
- 10.53469/jerp.2024.06(12).04
- Dec 30, 2024
- Journal of Educational Research and Policies
Objective: To study and analyze the learning behavior and influencing factors for cultural courses among students in sports technical secondary school. Methods: Students in the Kashi sport technical secondary school were the research subjects. A questionnaire was distributed to survey and analyze the the learning behavior and influencing factors for cultural courses. Results: 230 students were enrolled in the study. The students generally came from families with multiple children, with parents generally having lower education and cultural levels. Most students recognized the role of cultural courses, the teaching of cultural courses in schools, and the teaching of cultural course teachers. Students' learning attitudes and behaviors in cultural courses were significantly positively correlated with their academic performance in cultural courses(P<0.05), while students' interest in cultural courses was significantly positively correlated with their learning attitudes and behaviors(P<0.05). Additionally, the cultural course teachers' work attitude and care for students were significantly positively correlated with students' interest in cultural courses(P<0.05). Conclusion: We should pay more attention and care to the life and cultural courses learning for sport technical secondary school students. With multiple measures, it can increase students' interest in cultural courses, improve learning behavior, and enhance the teaching effectiveness of cultural courses.
- Conference Article
15
- 10.1145/3341042.3341065
- Jun 28, 2019
Online education platform's network learning can make effective integration of education and information technology, and has a great impact on students' learning style and mode. The students' learning characteristics can be represented as operable and understandable data indicators through their behavior data. By recording and analyzing the data of students' learning behavior, we can understand the learning status of different students and improve their learning effect. This paper first obtains the observation indicators with typical significance as the original data set, then through correlation analysis to extract six indicators that can objectively reflect learning characteristics. In this paper, cluster analysis of these indicators is carried out based on students' performance to illustrate that students' online learning has the characteristics of group. This paper focuses on the characteristics of learning behavior of different groups, and gives corresponding suggestions from the perspective of teaching management.
- Book Chapter
17
- 10.1007/978-3-030-48190-2_7
- Jan 1, 2020
An important advantage of e-learning environments is the numerical observation of the learning behaviors of learners. The use of e-learning environments by learners creates a learner data (log data). From these learner data, the navigation patterns obtained by using educational data mining have a very important in learning and teaching design. Studies have shown that learners’ learning behaviors in online learning environments may vary according to the characteristics of learners. Studies on the differentiation of the navigation patterns according to the psycho-educational characteristics of the learners provide very strong inputs to the design of the learning environment appropriate to the characteristics of the learners, which is named as adaptive learning environments. According to these inputs, learning environment designs can be developed according to the individual characteristics of the learners. Online learners’ readiness (OLR) for e-learning is an important psycho-educational structure. The aim of this study is to investigate learners’ navigations in the e-learning environment according to the level of readiness for e-learning. Self-directed learning, learner control, motivation sub-dimensions were used in this study as online readiness sub-dimensions. The consecutive analysis was used to reveal the model of human behavior and communication patterns. For this purpose, lag sequential analysis was used when learners’ system interactions were analyzed sequentially. According to the results of the analysis, it has been found that the sequential navigation patterns of the learners differ according to the OLR structure. The findings of this research are expected to provide important information and suggestions to online learning environment designers.
- Book Chapter
4
- 10.1007/978-981-13-2206-8_54
- Jan 1, 2018
With the rise of large-scale online open courses, the era of MOOC has come and it has created opportunities and challenges for higher education at home and abroad. Many colleges and universities use the combination of MOOC and classroom teaching. Through online and offline synchronization courses, the mixed teaching based on MOOC is formed. A series of learning behavior data has been generated when students participate in the online MOOC system. The resulting data facilitates learning analysis to improve teaching quality and improve learning behavior. This paper collects the learning behavior data from MOOC, uses the Pearson coefficient to select the learning behavior characteristics related to the learning effect, and establishes a learning performance classification model based on the Support Vector Machine (SVM), and predicts the learning performance according to the learning behavior data. The accuracy of the performance forecast was 95.26%.
- Research Article
19
- 10.1155/2021/9977977
- Jun 14, 2021
- Scientific Programming
In recent years, online and offline teaching activities have been combined by the Small Private Online Course (SPOC) teaching activities, which can achieve a better teaching result. Therefore, colleges around the world have widely carried out SPOC-based blending teaching. Particularly in this year’s epidemic, the online education platform has accumulated lots of education data. In this paper, we collected the student behavior log data during the blending teaching process of the “College Information Technology Fundamentals” course of three colleges to conduct student learning behavior analysis and learning outcome prediction. Firstly, data collection and preprocessing are carried out; cluster analysis is performed by using k-means algorithms. Four typical learning behavior patterns have been obtained from previous research, and these patterns were analyzed in terms of teaching videos, quizzes, and platform visits. Secondly, a multiclass classification framework, which combines a feature selection method based on genetic algorithm (GA) with the error correcting output code (ECOC) method, is designed for training the classification model to achieve the prediction of grade levels of students. The experimental results show that the multiclass classification method proposed in this paper can effectively predict the grade of performance, with an average accuracy rate of over 75%. The research results help to implement personalized teaching for students with different grades and learning patterns.
- Conference Article
- 10.2991/wartia-16.2016.372
- Jan 1, 2016
With the development of Internet information technology, all aspects of society are in development to digital, information-oriented. Student management system is widely used in Colleges and Universities. Student management system contains a lot of valuable data, and has not been fully excavated. This article is aimed at the university student attendance system. An algorithm of students' learning behavior analysis is designed. In the experiment, the number of the students is 423. Students are divided into four categories. The numbers of students in the four categories were: 7, 267, 112 and 24 respectively. The average scores of all the students in each category were: 107, 113, 117 and 123 respectively. This algorithm can effectively analyze students' learning behavior and provide effective support for students' management.
- Conference Article
41
- 10.1109/icicip47338.2019.9012177
- Dec 1, 2019
In this work, we study learning behavior analysis for automatic evaluation of the classroom teaching. We define five classroom learning behaviors including listen, fatigue, hand-up, sideways and read-write, and construct a class-room learning behavior dataset named as ActRec-Classroom, which includes five categories with 5,126 images in total. With the aid of convolutional neural network (CNN), we propose a classroom learning behavior analysis system framework. Firstly, Faster R-CNN is used to detect human body. Then OpenPose is used to extract key points of human skeleton, faces and fingers. Finally, a CNN based classifier is designed for action recognition. Extensive experiments validate the proposed system. The validation accuracy reaches 92.86% on average, and it meets the need of learning behavior analysis in the real classroom teaching environment.
- Book Chapter
2
- 10.1007/978-3-319-58753-0_24
- Jan 1, 2017
The goal of this study is to clarify how students’ learning styles give effects to their learning experience and behaviors while visual contents presented at high speed. In our experiment, participants (10 visual learners and 9 verbal learners) categorized by Felder’s index of learning styles learned information science by watching the video content composed of 6 slides. The participants watched the content on the YouTube and used variable-speed playback functionality: 0.5×; 1.0×; 1.25×: 1.5×; 2.0× and we recorded participants’ behaviors by using video cameras and measured how long they spent using the functionality. We applied ANOVA to the participants’ scores on the comprehension test, mean responses for the questionnaire, and the mean percentage of functionality-usage time duration. The comprehension test results indicated no signify discrepancies between visual learners and verbal learners. Questionnaire survey showed that verbal learners felt significantly less difficulty on the slide 2. The functionality usage time duration indicated that verbal learners spent significantly longer time duration watching the video content at 2.0× speed. Those findings suggest the possibility that verbal learners tend to use the hi-speed playback functionality longer than visual learners when they feel less difficulty on educational slides.
- Research Article
17
- 10.12691/automation-2-1-1
- Jan 23, 2014
- Journal of automation and control
Fault diagnosis, centered on pattern recognition techniques employing online measurements of process data, has been studied during the past decades. Amongst those techniques, artificial neural networks classifiers received an enormous attention due to some of their remarkable features. Recently, a new machine learning method based on statistical learning theory known as the Support Vector Machine (SVM) classifier is offered in the pattern recognition field. Support vector machine classifiers were originally used to solve binary classification problems. Subsequently, methods were proposed to apply support vector machine classifier to multiclass problems. Two of these mostly used methods are known as one versus one and one versus all. This paper deals with the application of the above mentioned classifiers for fault diagnosis of a chemical process containing a continuous stirred tank reactor and a heat exchanger. The results show a superior classification performance of the support vector machine versus the selected artificial neural network. In addition, the support vector machine classifier is very sensitive to the proper selection of the training parameters. It is shown that the utilization of genetic algorithm for optimal selection of these parameters is feasible and can help to improve the support vector machine classifier performance.
- Research Article
4
- 10.3991/ijim.v18i22.52447
- Nov 22, 2024
- International Journal of Interactive Mobile Technologies (iJIM)
In the era of big data, vocational education is confronted with the challenge of effectively utilizing students’ learning behavior data. With the advancement of information technology, the accumulation of students’ learning trajectories and behavior data presents new opportunities for the optimization of education and teaching. Currently, many studies focus on the analysis of short-term learning behaviors, while comprehensive consideration of both long- and short-term behaviors remains insufficient, limiting the precision of course design and resource recommendations. Therefore, the exploration of an optimization strategy that integrates students’ long- and short-term learning behaviors is urgently needed to enhance the effectiveness of vocational education. This study aims to propose a course design and optimization strategy based on educational technology, with a focus on integrating students’ long- and short-term learning behaviors, thereby presenting corresponding resource recommendation methods and course design plans. The study will provide more personalized and precise teaching solutions for vocational education, promoting the enhancement of educational quality.
- Research Article
1
- 10.2478/amns-2024-2228
- Jan 1, 2024
- Applied Mathematics and Nonlinear Sciences
As digital education continues to progress, more and more scholars are focusing on the analysis and optimization of big data in the field of education. However, the analysis and optimization of students’ learning behaviors using big data has received less attention. Therefore, this paper uses the improved K-means algorithm to cluster the four aspects of learning, diet, exercise, and consumption behaviors of music majors in College Z. We use the Apriori algorithm to conduct a correlation analysis between the clustered students’ consumption, life, learning, and grades. This analysis summarizes the characteristics of the students’ various behaviors and habits, enabling school administrators to provide effective and reasonable advice to the students. We used the improved K-means algorithm to identify five clustering results related to students’ behaviors. The correlation analysis revealed that 10.98% of the students were regular and hardworking, and there was a 97.78% probability that these students would get “excellent” grades. The majority of students who live a more regular life, spend more time on the Internet and have a low to medium level of consumption have a probability of getting “good” and “medium” grades, which indicates that the results of the big data survey are basically consistent with their actual situation. Obviously, the use of big data can improve the analysis of the correlation between students’ behaviors and grades.
- Research Article
1
- 10.1142/s0218126625501518
- Feb 5, 2025
- Journal of Circuits, Systems and Computers
Artificial intelligence (AI) and deep learning (DL) techniques are increasingly used in education because of advancements in online learning platforms and their ongoing implementation. The existing methods suffer from low-processing efficiency, high prediction error, and increased memory requirements when faced with vast learning and student behavior data. Thus, based on DL, this research suggests a way to analyze student behavior in e-learning. Data on student behavior are gathered, and a learning behavior model for online learning is created. The proposed optimal DL approach aims to screen the collected behavior data using data preparation, analysis, and statistics. Additionally, the Pearson correlation coefficient (PCC) approach is employed to determine the degree of data similarity. The novelty of the research is followed by utilizing an optimized DL network, known as a deep neural network with red deer optimization (ODNN-RDO), to mine students’ behavior data in e-learning programs. Two datasets, metrics including accuracy, precision, and recall, together with error measures like relative error, the root mean square error (RMSE), and absolute error, are utilized to test the created models. The improved generated models achieved 98.15% accuracy and 0–0.04% error compared to the current methods. The optimization procedure subsequently optimizes the components to acquire the best outcomes regarding faculty and parent performance monitoring of students. With effective monitoring, this model maximizes the e-learning platform for planning student growth.
- Research Article
4
- 10.13088/jiis.2012.18.2.029
- Jan 1, 2012
- Journal of Intelligence and Information Systems
Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.
- Research Article
- 10.2478/amns-2025-1031
- Jan 1, 2025
- Applied Mathematics and Nonlinear Sciences
In this paper, with the help of big data analytics, students’ learning behavior patterns are deeply mined, so as to provide personalized learning support for students. The massive data generated by students in the learning process is first mined. Then the K-means algorithm is used to cluster the students’ behaviors. Finally, personalized push of learning resources for different types of learners based on collaborative filtering algorithm and customized learning path based on genetic algorithm. Research design teaching practice to verify the application effect of the method in this paper. Taking 150 students in a class of school A as an example, the collected behavioral data of 148 students are clustered and analyzed, which can be divided into 4 types of learners, and the method of this paper can recommend resources that meet the knowledge point needs and learning preferences of different groups of students, and recommend appropriate learning paths for 4 types of learners based on genetic algorithms. After practicing teaching, the average English score of the experimental class is 7.49 higher than that of the traditional teaching class (control class), and there is a significant difference (P=0.002). It shows that personalized teaching based on students’ learning behavior analysis can effectively improve the quality of English teaching.