Abstract

Traditional Q-Learning algorithm has problems of data transmission lag and its environmental reward model is too simple. It cannot be well applied to the reinforcement learning of affective virtual human behaviour decision. Analogizing the thought of human' s self-reflection in this paper, a improved Q-learning algorithm is proposed, which can be easily applied in behavioural decision-making of affective virtual human. The Q-learning algorithm in this paper not only strengthens the behaviour strategy with better learning cycle and weakens the behaviour strategy with worse learning cycle by the way of self-reflection reward, but also picks up the speed of the effect of behavioural decision feedback to state-action pair in a learning cycle, thus improving the convergence rate of Q-learning algorithm in affective virtual human's behavioural decision-making. The algorithm aims at helping affective virtual human carry out path optimization in a two-dimensional grid environment in the simulation test. The results show that the improved Q-learning algorithm is significantly faster than the traditional Q-learning algorithm in achieving the optimal control strategy with an average of 43.7 learning cycles. The validity of the algorithm is verified.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.