A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Ashkan Zehfroosh,Herbert G Tanner

doi:10.3389/frobt.2022.797213

Ashkan Zehfroosh, Herbert G Tanner

Open Access

https://doi.org/10.3389/frobt.2022.797213

Copy DOI

Journal: Frontiers in robotics and AI	Publication Date: Mar 9, 2022
Citations: 1	License type: CC BY 4.0

Affiliation: University of Delaware

Abstract

This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free Delayed Q-learning and model-based R-max algorithms while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results are provided to support the claim regarding the new algorithm’s sample efficiency compared to its parents as well as the best known PAC model-free and model-based algorithms in application. A real-world experimental implementation of DDQ in the context of pediatric motor rehabilitation facilitated by infant-robot interaction highlights the potential benefits of the reported method.

Highlights

While several reinforcement learning (RL) algorithms can apply to a dynamical system modeled as a Markov decision process (MDP), few are probably approximately correct (PAC)—meaning able to guarantee how soon the algorithm will converge to a near-optimal policy
MDP models can be constructed to abstractly capture the dynamics of the social interaction between infant and robot, and RL algorithms can guide the behavior of the robot as it interacts with the infant in order to achieve the maximum possible rehabilitation outcome—the latter possibly quantified by the overall length of infant displacement, or the frequency of infant motor transitions
The first round of comparisons start with R-max, Delayed Q-learning, and DDQ being implemented on a small-scale gridworld example (Figure 1)

Summary

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction

Hybrid PAC Reinforcement Learning Algorithm for HumanRobot Interaction. This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free Delayed Q-learning and model-based R-max algorithms while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results are provided to support the claim regarding the new algorithm’s sample efficiency compared to its parents as well as the best known PAC model-free and model-based algorithms in application. A real-world experimental implementation of DDQ in the context of pediatric motor rehabilitation facilitated by infant-robot interaction highlights the potential benefits of the reported method

INTRODUCTION

TECHNICAL PRELIMINARIES

DDQ ALGORITHM

PAC ANALYSIS OF DDQ ALGORITHM

NUMERICAL RESULTS

Comparison of DDQ With Its Parent Methodologies

Comparison of DDQ to the Best Known PAC RL Algorithms

Experimental Results

CONCLUSION

DATA AVAILABILITY STATEMENT

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in robotics and AI

Lead the way for us

Similar Papers

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees

-

24 Feb 2020
24 Feb 2020

Probably Approximately Correct (PAC) exploration in reinforcement learning

-

01 Jan 2007
01 Jan 2007

Belief state separated reinforcement learning for autonomous vehicle decision making under uncertainty
Ziqing Gu ... Shengbo Eben Li
-
Ziqing Gu, et. al.Ziqing Gu ... Shengbo Eben Li
19 Sep 2021
19 Sep 2021

Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications
Min Wen ... Ufuk Topcu
IEEE Transactions on Automatic Control | VOL. 67
Min Wen, et. al.Min Wen ... Ufuk Topcu
01 Oct 2022
IEEE Transactions on Automatic Control | VOL. 67

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in robotics and AI