Abstract

This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free Delayed Q-learning and model-based R-max algorithms while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results are provided to support the claim regarding the new algorithm’s sample efficiency compared to its parents as well as the best known PAC model-free and model-based algorithms in application. A real-world experimental implementation of DDQ in the context of pediatric motor rehabilitation facilitated by infant-robot interaction highlights the potential benefits of the reported method.

Highlights

  • While several reinforcement learning (RL) algorithms can apply to a dynamical system modeled as a Markov decision process (MDP), few are probably approximately correct (PAC)—meaning able to guarantee how soon the algorithm will converge to a near-optimal policy

  • MDP models can be constructed to abstractly capture the dynamics of the social interaction between infant and robot, and RL algorithms can guide the behavior of the robot as it interacts with the infant in order to achieve the maximum possible rehabilitation outcome—the latter possibly quantified by the overall length of infant displacement, or the frequency of infant motor transitions

  • The first round of comparisons start with R-max, Delayed Q-learning, and DDQ being implemented on a small-scale gridworld example (Figure 1)

Read more

Summary

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction

Hybrid PAC Reinforcement Learning Algorithm for HumanRobot Interaction. This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free Delayed Q-learning and model-based R-max algorithms while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results are provided to support the claim regarding the new algorithm’s sample efficiency compared to its parents as well as the best known PAC model-free and model-based algorithms in application. A real-world experimental implementation of DDQ in the context of pediatric motor rehabilitation facilitated by infant-robot interaction highlights the potential benefits of the reported method

INTRODUCTION
TECHNICAL PRELIMINARIES
DDQ ALGORITHM
PAC ANALYSIS OF DDQ ALGORITHM
NUMERICAL RESULTS
Comparison of DDQ With Its Parent Methodologies
Comparison of DDQ to the Best Known PAC RL Algorithms
Experimental Results
CONCLUSION
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call