Abstract

In this paper, we investigated an approach for robots to learn to adapt dance actions to human’s preferences through interaction and feedback. Human’s preferences were extracted by analysing the common action patterns with positive or negative feedback from the human during robot dancing. By using a buffering technique to store the dance actions before a feedback, each individual’s preferences can be extracted even when a reward is received late. The extracted preferred dance actions from different people were then combined to generate improved dance sequences, i.e. performing more of what was preferred and less of that was not preferred. Together with Softmax action-selection method, the Sarsa reinforcement learning algorithm was used as the underlining learning algorithm and to effectively control the trade-off between exploitation of the learnt dance skills and exploration of new dance actions. The results showed that the robot learnt, using interactive reinforcement learning, the preferences of human partners, and the dance improved with the extracted preferences from more human partners.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call