Reinforcement Learning Based on Contextual Bandits for Personalized Online Learning Recommendation Systems

Wacharawan Intayoad,Punnarumol Temdee,Chayapol Kamyod

doi:10.1007/s11277-020-07199-0

Wacharawan Intayoad, Punnarumol Temdee + Show 1 more

https://doi.org/10.1007/s11277-020-07199-0

Copy DOI

Abstract

Personalized online learning has been significantly adopted in recent years and become a potential instructional strategy in online learning. The promising way to provide personalized online learning is personalized recommendation by navigating students to suitable learning contents at the right time. However, this is a nontrivial problem as the learning environments are considered as a high degree of flexibility as students independently learn according to their characteristics, and situations. Existing recommendation methods do not work effectively in such environment. Therefore, our objective of this study is to provide personalized dynamic and continuous recommendation for online learning systems. We propose the method that is based on the contextual bandits and reinforcement learning problems which work effectively in a dynamic environment. Moreover, we propose to use the past student behaviors and current student state as the contextual information to create the policy for the reinforcement agent to make the optimal decision. We deploy real data from an online learning system to evaluate our proposed method. The proposed method is compared with the well-known methods in reinforcement learning problems, i.e. $$\varepsilon$$ -greedy, greedy optimistic initial value, and upper bound confidence methods. The results depict that our proposed method significantly performs better than those benchmarking methods in our case test.

Full Text