Abstract

Nowadays, knowledge discovered from educational data sets plays an important role in educational decision making support. One kind of such knowledge that enables us to get insights into our students' characteristics is cluster models generated by a clustering task. Each cluster model presents the groups of similar students by several aspects such as study performance, behavior, skill, etc. Many recent educational data clustering works used the existing algorithms like k-means, expectation---maximization, spectral clustering, etc. Nevertheless, none of them considered the incompleteness of the educational data gathered in an academic credit system although incomplete data handling was figured out well with several different general-purpose solutions. Unfortunately, early in-trouble student detection normally faces data incompleteness as we have collected and processed the study results of the second-, third-, and fourth-year students who have not yet accomplished the program as of that moment. In this situation, the clustering task becomes an inevitable incomplete educational data clustering task. Hence, our work focuses on an incomplete educational data clustering approach to the aforementioned task. Following kernel-based vector quantization, we define a robust effective simple solution, named VQ_fk_nps, which is able to not only handle ubiquitous data incompleteness in an iterative manner using the nearest prototype strategy but also optimize the clusters in the feature space to reach the resulting clusters with arbitrary shapes in the data space. As shown through the experimental results on real educational data sets, the clusters from our solution have better cluster quality as compared to some existing approaches.

Highlights

  • Educational data mining is nowadays well known worldwide for discovering knowledge hidden in educational data to support educational decision making

  • For educational decision making support, we would like to early detect and support the in-trouble students who have just spent two years, three years, or four years studying in an academic credit system

  • Our work has to deal with a so-called incomplete educational data clustering task to discover some groups of the similar students based on their study performance at different points in study time

Read more

Summary

Introduction

Educational data mining is nowadays well known worldwide for discovering knowledge hidden in educational data to support educational decision making. Vietnam J Comput Sci (2016) 3:93–102 ence, and so on Among these related works, only Inyang and Joshua [11] has presented the handling of incomplete data by deleting the missing results in the courses while the others had no mention of incomplete data issues. Despite such a lack of incomplete data handling for an educational data clustering task, we are aware of many existing works on incomplete data clustering in general such as [1,2,7,8,9,23,25,27] A study of handling incomplete data in a clustering task is needed to attain an effective cluster model in general and in the education domain. The incomplete data sets that become completed after the data clustering task can be utilized in other mining tasks such as classification and association analysis

Incomplete educational data clustering task definition
Experimental results
Related works
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call