A Q-leaming algorithm applied to the behavioural decision-making of affective virtual human

Yiwei Zhang,Tianhuang Chen

doi:10.23919/icact.2017.7890121

Abstract

Traditional Q-Learning algorithm has problems of data transmission lag and its environmental reward model is too simple. It cannot be well applied to the reinforcement learning of affective virtual human behaviour decision. Analogizing the thought of human' s self-reflection in this paper, a improved Q-learning algorithm is proposed, which can be easily applied in behavioural decision-making of affective virtual human. The Q-learning algorithm in this paper not only strengthens the behaviour strategy with better learning cycle and weakens the behaviour strategy with worse learning cycle by the way of self-reflection reward, but also picks up the speed of the effect of behavioural decision feedback to state-action pair in a learning cycle, thus improving the convergence rate of Q-learning algorithm in affective virtual human's behavioural decision-making. The algorithm aims at helping affective virtual human carry out path optimization in a two-dimensional grid environment in the simulation test. The results show that the improved Q-learning algorithm is significantly faster than the traditional Q-learning algorithm in achieving the optimal control strategy with an average of 43.7 learning cycles. The validity of the algorithm is verified.

Full Text