Abstract

Studies on the prediction of student success in distance learning have explored mainly demographics factors and student interactions with the virtual learning environments. However, it is remarkable that a very limited number of studies use information about the assignments submitted by students as influential factor to predict their academic achievement. This paper aims to explore the real importance of assignment information for solving students’ performance prediction in distance learning and evaluate the beneficial effect of including this information. We investigate and compare this factor and its potential from two information representation approaches: the traditional representation based on single instances and a more flexible representation based on Multiple Instance Learning (MIL), focus on handle weakly labeled data. A comparative study is carried out using the Open University Learning Analytics dataset, one of the most important public datasets in education provided by one of the greatest online universities of United Kingdom. The study includes a wide set of different types of machine learning algorithms addressed from the two data representation commented, showing that algorithms using only information about assignments with a representation based on MIL can outperform more than 20% the accuracy with respect to a representation based on single instance learning. Thus, it is concluded that applying an appropriate representation that eliminates the sparseness of data allows to show the relevance of a factor, such as the assignments submitted, not widely used to date to predict students’ academic performance. Moreover, a comparison with previous works on the same dataset and problem shows that predictive models based on MIL using only assignments information obtain competitive results compared to previous studies that include other factors to predict students performance.

Highlights

  • The popularization of Internet access and the advances in the exploration of digital resources have led to a growing interest in distance education

  • The prediction of student success according to their work collected by Virtual Learning Environments (VLEs)’s system has become an essential task to be able to discover the main features that describe to the students that pass satisfactorily a course

  • Our work proposes to use assignments information from a flexible data representation perspective based on Multiple Instance Learning (MIL)

Read more

Summary

Introduction

The popularization of Internet access and the advances in the exploration of digital resources have led to a growing interest in distance education. The current distance studies could not be understood without a digital platform that provides fundamental features like the publication of the contents of the course, a channel to maintain professor-students communication or the tools to keep a control of the student evolution These systems, called Virtual Learning Environments (VLEs), include course content delivery instruments, quiz modules and assignment submission components, among other functionalities [2]. Even though the history of distance courses is too recent, they have experimented a high expansion, with Massive Open Online Course (MOOCs) as the most popular example This new format is opening a widespread investigation, due to its differences with respect to traditional face-to-face higher education. In a traditional machine learning setting, an object M can be represented by a feature vector V ( M ) associated with a label f ( M), (V ( M), f ( M ))

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call