Hierarchical Policy Research Articles

Recently, Reinforcement Learning (RL) has shown great performance in solving sequential decision-making and control in dynamic environment problems. Despite its achievements, deploying Deep Neural Network (DNN)-based RL is expensive in terms of time and power due to the large number of episodes required to train agents with high dimensional image representations. Additionally, at the interference the large energy footprint of deep neural networks can be a major drawback. Embedded edge devices as the main platform for deploying RL applications are intrinsically resource-constrained and deploying deep neural network-based RL on them is a challenging task. As a result, reducing the number of actions taken by the RL agent to learn desired policy, along with the energy-efficient deployment of RL, is crucial. In this article, we propose Energy Efficient Hierarchical Reinforcement Learning (E2HRL), which is a scalable hardware architecture for RL applications. E2HRL utilizes a cross-layer design methodology for achieving better energy efficiency, smaller model size, higher accuracy, and system integration at the software and hardware layers. Our proposed model for RL agent is designed based on the learning hierarchical policies, which makes the network architecture more efficient for implementation on mobile devices. We evaluated our model in three different RL environments with different level of complexity. Simulation results with our analysis illustrate that hierarchical policy learning with several levels of control improves RL agents training efficiency and the agent learns the desired policy faster compared to a non-hierarchical model. This improvement is specifically more observable as the environment or the task becomes more complex with multiple objective subgoals. We tested our model with different hyperparameters to achieve the maximum reward by the RL agent while minimizing the model size, parameters, and required number of operations. E2HRL model enables efficient deployment of RL agent on resource-constraint-embedded devices with the proposed custom hardware architecture that is scalable and fully parameterized with respect to the number of input channels, filter size, and depth. The number of processing engines (PE) in the proposed hardware can vary between 1 to 8, which provides the flexibility of tradeoff of different factors such as latency, throughput, power, and energy efficiency. By performing a systematic hardware parameter analysis and design space exploration, we implemented the most energy-efficient hardware architectures of E2HRL on Xilinx Artix-7 FPGA and NVIDIA Jetson TX2. Comparing the implementation results shows Jetson TX2 boards achieve 0.1 ∼ 1.3 GOP/S/W energy efficiency while Artix-7 FPGA achieves 1.1 ∼ 11.4 GOP/S/W, which denotes 8.8× ∼ 11× better energy efficiency of E2HRL when model is implemented on FPGA. Additionally, compared to similar works our design shows better performance and energy efficiency.

Classical imitation learning methods suffer substantially from the learning hierarchical policies when the imitative agent faces an unobserved state by the expert agent. To address these drawbacks, we propose an online active learning through active inference approach that encodes the expert’s demonstrations based on observation-action to improve the learner’s future motion prediction. For this purpose, we provide a switching Dynamic Bayesian Network based on the dynamic interaction between the expert agent and another object in its surrounding as a reference model, which we exploit to initialize an incremental probabilistic learning model. This learning model grows and matures based on the free-energy formulation and message passing of active inference dynamically at discrete and continuous levels in an online active learning phase. In this scheme, generalized states of the learning world are represented as distance-vector, where it is the learner’s observation concerning its interaction with a moving object. Considering the distance vector entail intentions, it enables action prediction evaluation in a prospective sense. We illustrate these points using simulations of driving intelligent agents. The learning agent is trained by using long-term predictions from the generative learning model to reproduce the expert’s motion while learning how to select a suitable action through new experiences. Our results affirm that a Dynamic Bayesian optimal approach provides a principled framework and outperforms conventional reinforcement learning methods. Furthermore, it endorses the general formulation of action prediction as active inference.

Hierarchical Policy Research Articles

Related Topics

Articles published on Hierarchical Policy

E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning

PG-CODE: Latent dirichlet allocation embedded policy knowledge graph for government department coordination

Hierarchical reinforcement learning for automatic disease diagnosis.

Hierarchical policy with deep-reinforcement learning for nonprehensile multiobject rearrangement

Muscle‐driven virtual human motion generation approach based on deep reinforcement learning

HLifeRL: A hierarchical lifelong reinforcement learning framework

Network Policy Enforcement With Commodity Multiqueue NICs for Multitenant Data Centers

An Unequal Ethiopia in an Unequal World: Global and Domestic Hierarchies in Afäwärḳ Gäbrä-Iyyäsus’s and Käbbädä Mikael’s Political Thought (1908 and 1949)

A SYN flooding attack detection approach with hierarchical policies based on self‐information

Shaping Individualized Impedance Landscapes for Gait Training via Reinforcement Learning

Hierarchical Landmark Policy Optimization for Visual Indoor Navigation

Active Inference Integrated With Imitation Learning for Autonomous Driving

Dynamic Adaptation Method of Business Process Based on Hierarchical Feature Model

Efficient hierarchical policy network with fuzzy rules

Hierarchical Reinforcement Learning

Intrinsically Motivated Hierarchical Policy Learning in Multiobjective Markov Decision Processes

Layered Relative Entropy Policy Search

Playing Against the Board: Rolling Horizon Evolutionary Algorithms Against Pandemic

Efficient Robotic Object Search Via HIEM: Hierarchical Policy Learning With Intrinsic-Extrinsic Modeling

Multiagent Hierarchical Cognition Difference Policy for Multiagent Cooperation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hierarchical Policy Research Articles

Related Topics

Articles published on Hierarchical Policy

E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning

PG-CODE: Latent dirichlet allocation embedded policy knowledge graph for government department coordination

Hierarchical reinforcement learning for automatic disease diagnosis.

Hierarchical policy with deep-reinforcement learning for nonprehensile multiobject rearrangement

Muscle‐driven virtual human motion generation approach based on deep reinforcement learning

HLifeRL: A hierarchical lifelong reinforcement learning framework

Network Policy Enforcement With Commodity Multiqueue NICs for Multitenant Data Centers

An Unequal Ethiopia in an Unequal World: Global and Domestic Hierarchies in Afäwärḳ Gäbrä-Iyyäsus’s and Käbbädä Mikael’s Political Thought (1908 and 1949)

A SYN flooding attack detection approach with hierarchical policies based on self‐information

Shaping Individualized Impedance Landscapes for Gait Training via Reinforcement Learning

Hierarchical Landmark Policy Optimization for Visual Indoor Navigation

Active Inference Integrated With Imitation Learning for Autonomous Driving

Dynamic Adaptation Method of Business Process Based on Hierarchical Feature Model

Efficient hierarchical policy network with fuzzy rules

Hierarchical Reinforcement Learning

Intrinsically Motivated Hierarchical Policy Learning in Multiobjective Markov Decision Processes

Layered Relative Entropy Policy Search

Playing Against the Board: Rolling Horizon Evolutionary Algorithms Against Pandemic

Efficient Robotic Object Search Via HIEM: Hierarchical Policy Learning With Intrinsic-Extrinsic Modeling

Multiagent Hierarchical Cognition Difference Policy for Multiagent Cooperation