Action Selection Policy Research Articles

Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor–critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

In modern manufacturing industry, dynamic scheduling methods are urgently needed with the sharp increase of uncertainty and complexity in production process. To this end, this paper addresses the dynamic flexible job shop scheduling problem (DFJSP) under new job insertions aiming at minimizing the total tardiness. Without lose of generality, the DFJSP can be modeled as a Markov decision process (MDP) where an intelligent agent should successively determine which operation to process next and which machine to assign it on according to the production status of current decision point, making it particularly feasible to be solved by reinforcement learning (RL) methods. In order to cope with continuous production states and learn the most suitable action (i.e. dispatching rule) at each rescheduling point, a deep Q-network (DQN) is developed to address this problem. Six composite dispatching rules are proposed to simultaneously select an operation and assign it on a feasible machine every time an operation is completed or a new job arrives. Seven generic state features are extracted to represent the production status at a rescheduling point. By taking the continuous state features as input to the DQN, the state–action value (Q-value) of each dispatching rule can be obtained. The proposed DQN is trained using deep Q-learning (DQL) enhanced by two improvements namely double DQN and soft target weight update. Moreover, a “softmax” action selection policy is utilized in real implementation of the trained DQN so as to promote the rules with higher Q-values while maintaining the policy entropy. Numerical experiments are conducted on a large number of instances with different production configurations. The results have confirmed both the superiority and generality of DQN compared to each composite rule, other well-known dispatching rules as well as the stand Q-learning-based agent.

Action Selection Policy Research Articles

Related Topics

Articles published on Action Selection Policy

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

Explorer-Actor-Critic: Better actors for deep reinforcement learning

An optimized Q-Learning algorithm for mobile robot local path planning

MEOL: A Maximum-Entropy Framework for Options Learning.

An end-to-end deep reinforcement learning method based on graph neural network for distributed job-shop scheduling problem

Decoupled Monte Carlo Tree Search for Cooperative Multi-Agent Planning

Deep reinforcement learning based cooperative control of traffic signal for multi‐intersection network in intelligent transportation system using edge computing

Robust Tests in Online Decision-Making

Action control, forward models and expected rewards: representations in reinforcement learning

Coverage path planning for maritime search and rescue using reinforcement learning

Intelligent Trajectory Planning in UAV-Mounted Wireless Networks: A Quantum-Inspired Reinforcement Learning Perspective

A pseudo-softmax function for hardware-based high speed image classification

Mastering Atari, Go, chess and shogi by planning with a learned model.

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Path Planning for UAV-Mounted Mobile Edge Computing With Deep Reinforcement Learning

Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning

An Experience Aggregative Reinforcement Learning With Multi-Attribute Decision-Making for Obstacle Avoidance of Wheeled Mobile Robot

Personalized Robot Tutoring Using the Assistive Tutor POMDP (AT-POMDP)

Learning Representations in Model-Free Hierarchical Reinforcement Learning

A Multi-Agent Based Intelligent Training System for Unmanned Surface Vehicles

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Action Selection Policy Research Articles

Related Topics

Articles published on Action Selection Policy

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

Explorer-Actor-Critic: Better actors for deep reinforcement learning

An optimized Q-Learning algorithm for mobile robot local path planning

MEOL: A Maximum-Entropy Framework for Options Learning.

An end-to-end deep reinforcement learning method based on graph neural network for distributed job-shop scheduling problem

Decoupled Monte Carlo Tree Search for Cooperative Multi-Agent Planning

Deep reinforcement learning based cooperative control of traffic signal for multi‐intersection network in intelligent transportation system using edge computing

Robust Tests in Online Decision-Making

Action control, forward models and expected rewards: representations in reinforcement learning

Coverage path planning for maritime search and rescue using reinforcement learning

Intelligent Trajectory Planning in UAV-Mounted Wireless Networks: A Quantum-Inspired Reinforcement Learning Perspective

A pseudo-softmax function for hardware-based high speed image classification

Mastering Atari, Go, chess and shogi by planning with a learned model.

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Path Planning for UAV-Mounted Mobile Edge Computing With Deep Reinforcement Learning

Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning

An Experience Aggregative Reinforcement Learning With Multi-Attribute Decision-Making for Obstacle Avoidance of Wheeled Mobile Robot

Personalized Robot Tutoring Using the Assistive Tutor POMDP (AT-POMDP)

Learning Representations in Model-Free Hierarchical Reinforcement Learning

A Multi-Agent Based Intelligent Training System for Unmanned Surface Vehicles