Learning Performance Of Agent Research Articles

Various methods have been proposed in the literature for identifying subgoals in discrete reinforcement learning (RL) tasks. Once subgoals are discovered, task decomposition methods can be employed to improve the learning performance of agents. In this study, we classify prominent subgoal identification methods for discrete RL tasks in the literature into the following three categories: graph-based, statistics-based, and multi-instance learning (MIL)-based. As contributions, firstly, we introduce a new MIL-based subgoal identification algorithm called EMDD-RL and experimentally compare it with a previous MIL-based method. The previous approach adapts MIL’s Diverse Density (DD) algorithm, whereas our method considers Expected-Maximization Diverse Density (EMDD). The advantage of EMDD over DD is that it can yield more accurate results with less computation demand thanks to the expectation-maximization algorithm. EMDD-RL modifies some of the algorithmic steps of EMDD to identify subgoals in discrete RL problems. Secondly, we evaluate the methods in several RL tasks for the hyperparameter tuning overhead they incur. Thirdly, we propose a new RL problem called key-room and compare the methods for their subgoal identification performances in this new task. Experiment results show that MIL-based subgoal identification methods could be preferred to the algorithms of the other two categories in practice.

Modern active distribution networks (ADNs) witness increasing complexities that require efforts in control practices, including optimal reactive power dispatch (ORPD). Deep reinforcement learning (DRL) is proposed to manage the network's reactive power by coordinating different resources, including distributed energy resources, to enhance performance. However, there is a lack of studies examining DRL elements' performance sensitivity. To this end, in this paper we examine the impact of various DRL reward representations and hyperparameters on the agent's learning performance when solving the ORPD problem for ADNs. We assess the agent's performance regarding accuracy and training time metrics, as well as critic estimate measures. Furthermore, different environmental changes are examined to study the DRL model's scalability by including other resources. Results show that compared to other representations, the complementary reward function exhibits improved performance in terms of power loss minimization and convergence time by 10-15% and 14-18%, respectively. Also, adequate agent performance is observed to be neighboring the best-suited value of each hyperparameter for the studied problem. In addition, scalability analysis depicts that increasing the number of possible action combinations in the action space by approximately nine times results in 1.7 times increase in the training time.

Learning Performance Of Agent Research Articles

Related Topics

Articles published on Learning Performance Of Agent

Norm Augmented Reinforcement Learning Agents With Synthesized Normative Rules

Faster MIL-based Subgoal Identification for Reinforcement Learning by Tuning Fewer Hyperparameters

A location-based advising method in teacher–student frameworks

Optimal Reactive Power Dispatch in ADNs using DRL and the Impact of Its Various Settings and Environmental Changes.

Intelligent multi-branch allocation of spectrum slices for inter-numerology interference minimization

Differential Advising in Multiagent Reinforcement Learning.

Differentially Private Malicious Agent Avoidance in Multiagent Advising Learning.

Automatic Curriculum Graph Generation for Reinforcement Learning Agents

Graph based skill acquisition and transfer Learning for continuous reinforcement learning domains

Evolved Intrinsic Reward Functions for Reinforcement Learning

Genetic Programming for Reward Function Search

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Learning Performance Of Agent Research Articles

Related Topics

Articles published on Learning Performance Of Agent

Norm Augmented Reinforcement Learning Agents With Synthesized Normative Rules

Faster MIL-based Subgoal Identification for Reinforcement Learning by Tuning Fewer Hyperparameters

A location-based advising method in teacher–student frameworks

Optimal Reactive Power Dispatch in ADNs using DRL and the Impact of Its Various Settings and Environmental Changes.

Intelligent multi-branch allocation of spectrum slices for inter-numerology interference minimization

Differential Advising in Multiagent Reinforcement Learning.

Differentially Private Malicious Agent Avoidance in Multiagent Advising Learning.

Automatic Curriculum Graph Generation for Reinforcement Learning Agents

Graph based skill acquisition and transfer Learning for continuous reinforcement learning domains

Evolved Intrinsic Reward Functions for Reinforcement Learning

Genetic Programming for Reward Function Search