Policy space identification in configurable environments

Alberto Maria Metelli,Marcello Restelli,Guglielmo Manneschi

doi:10.1007/s10994-021-06033-3

Abstract

We study the problem of identifying the policy space available to an agent in a learning process, having access to a set of demonstrations generated by the agent playing the optimal policy in the considered space. We introduce an approach based on frequentist statistical testing to identify the set of policy parameters that the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under different assumptions on the policy space, we provide a probabilistic analysis of the simplified one in the case of linear policies belonging to the exponential family. To improve the performance of our identification rules, we make use of the recently introduced framework of the Configurable Markov Decision Processes, exploiting the opportunity of configuring the environment to induce the agent to reveal which parameters it can control. Finally, we provide an empirical evaluation, on both discrete and continuous domains, to prove the effectiveness of our identification rules.

Highlights

Reinforcement Learning (RL, Sutton and Barto, 2018) deals with sequential decision– making problems in which an artificial agent interacts with an environment by sensing perceptions and performing actions
We study the problem of identifying the policy space available to an agent in a learning process, having access to a set of demonstrations generated by the agent playing the optimal policy in the considered space
Additional experiments together with the hyperparameter values are reported in Appendix C. 6.1 Identification rules experiments we provide two experiments to test the ability of our identification rules in properly selecting the parameters the agent controls in different settings

Summary

Introduction

Reinforcement Learning (RL, Sutton and Barto, 2018) deals with sequential decision– making problems in which an artificial agent interacts with an environment by sensing perceptions and performing actions. The agent’s goal is to find an optimal policy, i.e., a prescription of actions that maximizes the (possibly discounted) cumulative reward collected during its interaction with the environment. Machine Learning ability to map observations to actions. These three elements define the policy space available to the agent in the learning process. Agents having access to different policy spaces may exhibit different optimal behaviors, even in the same environment. The notion of optimality is necessarily connected to the space of policies the agent can access, which we will call the agent’s policy space in the following. The need to limit the policy space naturally emerges in many industrial applications, where some behaviors have to be avoided for safety reasons

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning	Publication Date: Sep 5, 2021
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

Policy space identification in configurable environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Similar Papers

Dynamic Optimization of Random Access in Deadline-Constrained Broadcasting
Aoyu Gong ... Yijin Zhang
IEEE Transactions on Network Science and Engineering | VOL. 10
Aoyu Gong, et. al.Aoyu Gong ... Yijin Zhang
01 Jul 2023
IEEE Transactions on Network Science and Engineering | VOL. 10

Exploring Induced Pedagogical Strategies Through a Markov Decision Process Framework: Lessons Learned
...
-
, et. al. ...
26 Dec 2018
26 Dec 2018

Compact representation of coordinated sampling policies for Body Sensor Networks
Shuping Liu ...
-
Shuping Liu, et. al.Shuping Liu ...
01 Dec 2010
01 Dec 2010

Markov decision process (MDP) framework for software power optimization using call profiles on mobile phones
Eric Jung ... Frank Maker
Design Automation for Embedded Systems | VOL. 14
Eric Jung, et. al.Eric Jung ... Frank Maker
01 Jun 2010
Design Automation for Embedded Systems | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Policy space identification in configurable environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning