Data-driven unsupervised clustering of online learner behaviour

Robert L Peach,David Lefevre,Mauricio Barahona,Sophia N Yaliraki

doi:10.1038/s41539-019-0054-0

Robert L Peach, David Lefevre + Show 2 more

Open Access

https://doi.org/10.1038/s41539-019-0054-0

Copy DOI

Journal: NPJ science of learning	Publication Date: Sep 3, 2019
Citations: 33	License type: open-access

Affiliation: Imperial College London

Abstract

The widespread adoption of online courses opens opportunities for analysing learner behaviour and optimising web-based learning adapted to observed usage. Here, we introduce a mathematical framework for the analysis of time-series of online learner engagement, which allows the identification of clusters of learners with similar online temporal behaviour directly from the raw data without prescribing a priori subjective reference behaviours. The method uses a dynamic time warping kernel to create a pair-wise similarity between time-series of learner actions, and combines it with an unsupervised multiscale graph clustering algorithm to identify groups of learners with similar temporal behaviour. To showcase our approach, we analyse task completion data from a cohort of learners taking an online post-graduate degree at Imperial Business School. Our analysis reveals clusters of learners with statistically distinct patterns of engagement, from distributed to massed learning, with different levels of regularity, adherence to pre-planned course structure and task completion. The approach also reveals outlier learners with highly sporadic behaviour. A posteriori comparison against student performance shows that, whereas high-performing learners are spread across clusters with diverse temporal engagement, low performers are located significantly in the massed learning cluster, and our unsupervised clustering identifies low performers more accurately than common machine learning classification methods trained on temporal statistics of the data. Finally, we test the applicability of the method by analysing two additional data sets: a different cohort of the same course, and time-series of different format from another university.

Highlights

The application of data analytics to educational data has surged in the past few years facilitated by the adoption of online learning platforms.[1]
We first create a similarity matrix between learners using a dynamic time warping kernel. This matrix is transformed into a similarity graph using a sparsification based on the Relaxed Minimum Spanning Tree,[21] a procedure that retains global network connectivity while discarding weak similarities that can be explained through longer chains of strong similarities
We have described an approach for the analysis of temporal data of online learning behaviours, in which distinct clusters of learners are obtained algorithmically without using a priori statistical information about individual behaviours or about the number or type of expected behaviours across the cohort

Summary

INTRODUCTION

The application of data analytics to educational data has surged in the past few years facilitated by the adoption of online learning platforms.[1]. DTW has been shown to outperform a variety of measures in classification tasks[19] and provides a principled way to use the full, raw information of the time-series without preselecting features or functional representations.[20] From the ensuing DTW similarity matrix, we construct a similarity graph, where nodes are learners and weighted links represent similarities between learners. This graph construction step is carried out using the Relaxed Minimum Spanning Tree algorithm,[21] which aims to encapsulate the locally strong and globally relevant similarities in the data set. Our data-driven npj Science of Learning (2019) 14

RESULTS

DISCUSSION

CODE AVAILABILITY